Please contact us at [mtgeloproject AT gmail DOT com]. Especially if you have a piece of information like:
- I played in an event that’s not listed under my name.
- I didn’t play in an event that is listed as me.
- The GP listed under [NAME] actually belongs to [NAME].
- [NAME] and [NAME] are the same person.
- I’m in the system as [NAME] but I’d prefer for my name to be styled like [Name].
- In randomly clicking around, I saw that [NAME] doesn’t appear in this GP until round 11.
All of that is useful. We’re trying to make the data as accurate as we can.
Note that because the database is so big (over 3,650,000 matches), it is cumbersome to refresh the ratings. Since every future match could be affected by changes in the past, we need to rerun everything to fix errors. For that reason we’ll be consolidating errors that are brought to our attention and fixing them as part of a periodic refresh.
The results pages from the event coverage of all of the events, like this one. Note that pre-pandemic coverage data only includes names, not any sort of unique identifier like DCI number or anything like that. So we had to decide which results belonged to the same player ourselves. We did our best to combine entries we reasonably thought were the same person, but we only have so much to go on. We were pretty timid about combining anyone with a very common name like Johnson, Miller, Williams, etc., so we need some help from the community to improve those entries.
For post-pandemic events hosted on Melee, players do have a unique identifier in their username. On the other hand, deciding how to combine Melee results to pre-Melee results is not always obvious. Additionally, some players seem to have multiple Melee accounts. So while this has helped in the curating process, it isn’t a cure-all.
We also made some guesses based on geography about whether some people were the same. For instance, if a name played in Melbourne, Sydney, Toronto 2015, and Toronto 2016, we split the Australian entries from the Canadian ones. Unfortunately, not all GPs include the nationality of the participants, in many instances the nationality is incorrect, and Melee doesn’t include nationality at all, so we couldn’t really use that as a method to distinguish people.
No problem! We feel okay about gathering this data in one place in general, since we’ve used nothing that isn’t publicly available, but if you’d rather not be searchable we totally understand. Just send us an email at [mtgeloproject AT gmail DOT com] and we’ll happily change you to “anonymous” in the system.
The dataset includes several different tournament series:
- Every individual Pro Tour from 1999-present, including a couple from pre-1999.
- Every individual Grand Prix from 2000-2020, including a couple from pre-2000.
- Magic Online Championship Series events, 2010-2019.
- National Championships and Continental Championships with extant data, 1999-2018. [Link to blog post with information about what survived.] (There were three continental championship series: the European championship, APAC championship, and LatAm championship.)
- Every Regional Championship from 2022-present.
- Each Pro Tour Qualifier from MagicCons, 2023-present.
- Practically every individual event in the SCG Open Series, as well as every SCG Invitational and SCG Player’s Championship.
- Events under the heading of “two-day destination RCQs”: MXP Opens, SCG CON 20Ks, and Legacy EU Grand Open Qualifiers.
- Individual Masters Series events, 2000-2003.
See the next question for information on which early tournaments we’re missing.
Initially we planned for the site to only include Grand Prix (and actually almost registered the website www.grandprixelo.net
). Ultimately we expanded the project for a variety of reasons; at the forefront it became clear that the true mission of the site is to preserve Magic’s tournament history, and to that end including whatever data we could find still surviving on the internet seemed like a more valuable endeavor than the purity of the data. It is true that from a ratings perspective, the variety of tournaments are not all exactly “like terms” (different stakes, different testing methods, different clientele). We could have fiddled with the formula to give different weight to matches in different types of tournaments, but nothing that we’ve tried so far has improved the predictive power of the rating system, so those ideas have were tabled. This is a question that we’re hoping to re-address soon.
Here’s a timeline of the first few years of professional Magic. Anything not colored black is missing.
As far as PTs/GPs go, everything following gpkl00 is present. As far as I’ve been able to tell, the purple ones are the only others to have been on the internet at some point. ptchi97 had coverage but day one was never posted (technical issues at the tournament site on the day of the event). ptchi98 had coverage but day one has been lost. ptny98 had coverage but day two has been lost. As for Grand Prix, gpsf99 is in tatters and isn’t salvageable, and I haven’t actually found a working link to gpams99 tournament coverage (someone messed up and had it link to gpbar99, and because of that the Wayback Machine never crawled the Amsterdam coverage). Original coverage from both gpant98 and gpzur98 exist, but not in a form robust enough for the purposes of the site.
In addition, the following individual rounds are missing:
- GP Barcelona 1999, round 1
- GP Kansas City 1999, rounds 1, 2, and 3
- GP Cape Town 2001, round 11
- GP Singapore 2007, round 1
- GP Costa Rica 2012, round 2.
For SCGs, there are a couple of events whose old data was either not preserved or corrupted. (The ones that are not preserved could actually still be hosted on static.starcitygames.com somewhere, but I haven’t been able to find them.) The missing events are
- SCG New Orleans 2012 (October 27-28, 2012) (two events),
- SCG St. Louis 2012 (#2) (November 3-4, 2012) (two events),
- SCG Dallas 2012 (#2) (November 10-11, 2012) (two events),
- SCG Seattle 2012 (#2) (November 17-18, 2012) (two events),
- SCG Baltimore 2012 (#2) (standard) (December 1, 2012) (the legacy event on 12/2 exists),
- SCG Cincinnati 2013 (#1) (February 16-17, 2013) (two events), and
- SCG Charlotte 2013 (May 11-12, 2013) (two events) (this one bums me out as the 5/12 Open was the only individual sealed Open ever).
There are approximately twelve missing SCG rounds. Some of them I tried to reubild from tiebreakers and so are partially in the database.
These events aren’t on the site but will be before long.
- MOCS Champions Showcases, 2020-present
- Arena Championships, 2021-present
- MPL League matches, 2020-2021
- Something from the NRG Series
- Something from Axion Now
It seems that since the start of 2024, SCG CONs no longer have featured a two-day main event. We may ultimately decide to add the one-day 10Ks to continue getting data from these events.
Each person enters the system with a rating of 1500. After each match, the winning player takes points from the losing player. The amount of points at stake is determined by the difference in the players’ ratings.
Each person enters the system with a rating of 1500. When a match is played, Elo estimates how likely it is that each player will win the match. This is based on the difference in the ratings: if player A has rating RA and player B has rating RB, Elo assigns player A a likelihood of winning equal to
The number in the denominator is chosen specifically so that a 200-point rating difference corresponds to a 60% win expectancy. This is different from chess ratings, which are calibrated so that a ten-to-one favorite in a match will be rated exactly 400 points above his competitor. It didn’t seem to us like there could ever be a situation where someone is 91% to win a match of Magic—there’s too much variance in the game. This is the same normalization used in @Quick_MTGStat’s PT Elo table (rip). We elected to use a strange number (in place of the round 400) in order to have a nice interpretation for what 200 points “means”. Here’s a table of interpretations for other round percentages:
Roughly speaking, each 100 points corresponds to a further five percent. So for example if a hypothetical correctly-rated player with a rating of 1750 were to play a hypothetical correctly-rated player with a rating of 1550 over and over again, the player with the 1750 rating will win 60% of the matches. (This is a long-term percentage, just like if you flip a coin “a lot of times” you’ll see 50% heads and 50% tails.)
After each match the ratings are updated with the new result. The winning player receives points from the losing player in proportion to the win probability their opponent had. The constant of proportionality is K = 36. Continuing on the 1750 vs. 1550 example, if the 1750-rated player wins, she receives 36 * 0.4 = 14.4 points and her opponent loses 14.4 points. If the 1550-rated player wins, he recevies 36 * 0.6 = 21.6 points and the higher-rated player loses 21.6 points. The numbers 14.4 and 21.6 are in the correct proportion to keep the ratings stable if they were to play a long series of matches and win at the rates expected by the model. Internally the database keeps track of fractions, but we round everything off to the nearest integer when it’s displayed to try to avoid distractions.
The value 36 was chosen for K because it seems to do the best job at making the ratings have the predictive power they are meant to have: when looking at matches played between people who have played at least ten matches already (“veteran” players), a player rated between 35-65 points above their opponent won 52.53% of the time. With an 85-115 point difference the higher rated player won 54.76% of the time, etc. These numbers came out closest to the intended percentages when we set K=36. We messed around with using a different value of K for Pro Tours, but nothing we’ve tried yet has gotten us a better fit to the underlying model.
To see how your rating compares to other players in our database, check out the histogram and percentile tables on the statistics page. For example, if your rating is 1575, and you see that the 85th percentile is 1565 while the 90th percentile is 1602, you can estimate that you’re in approximately the top 12.5% of players in our system. Note that a lot of players only participated in one tournament, so the “veterans” data may be more relevant.
Players whose rating is over 1700 have their ranking appear on their profile page.
No. Ratings updates are only based on the result, not the game score.
Unintentional draws are included as half of a win for each player. This has a minor effect on ratings, which you can see as many draws have a delta of +1 or +2. (Many change the ratings by less than 0.5 and so are reported as 0 due to rounding. Rest assured whatever tiny effect they should have is included behind the scenes.)
As an example, suppose two players whose ratings differ by 200 points were to battle to a draw. The higher rated player had a 60% win expectancy for the match, while the lower rated player should win the other 40% of the time. In a draw, the higher rated player received 0.5 wins, which is 0.1 short of his expected value of 0.6. The net result is -0.1, and so their rating changes by -0.1 times K, so -3.6 for us. Meanwhile the lower rated player enjoys a boost of 3.6 points from the result. A draw between players that are 54.86 points different in rating will result in a change of exactly one point, and over 73% of draws were worth less than that.
Intentional draws do not affect players’ ratings. I did my best to find all of the intentional draws (9834 and counting as of this writing), but some undoubtedly have slipped through the cracks. If you took an ID in a match and it does not appear as such, let us know and we’ll correct it.
Our current policy is no; part of what you give up when you scoop a match is the Elo. We’ve come to this policy for a few reasons:
- Unlike IDs which I do try to distinguish, concessions appear in tournament results as regular losses. If the site is going to be a record of how tournaments palyed out, then they should record what the tournament thought was the result of the match.
- I don’t want to put myself in the position of ever having to guess at what wins were actually concessions, especially given the first point. If concessions had to be marked, then I’d have to try to figure out if and when they occur.
- Given that the system for marking concessions would have to be “contact us and tell us what needs to be marked”, basically all concessions from old events would never get marked. It’s possible for events that are 20+ years old, no one even remembers that a concession happened.
- A self-reporting system for concessions would give an advantage to those who know to report concessions in the first place.
You may or may not agree with this, but this is where we’re coming from.
No. Only matches played between two people will move the ratings. This is why well-known players’ results start at round 3 or round 4 in many Grand Prix.
No, for two reasons: Two reasons: first, many of the team results pages only have last names listed, and if it was a nightmare trying to decide who’s who when you have access to the full names, just imagine what that would be like to curate. Secondly and more importantly, team events and individual events are not like terms; the result in a team match doesn’t tell you anything at the individual level. So updating what are intended to be individual ratings with results in team matches doesn’t really make sense.
For team GPs/PTs specifically, we did put together a Google sheet with some stats for the most prolific players. If you’d like your own team history to be included in this, let me know. I doubt I’m ever going to curate team SCGs but never say never.
The ratings are pretty insensitive to mistakes that don’t involve you. If there’s an event you played in recently that wasn’t showing up under your name, merging it in could naturally cause a big shift to your rating. But the fact that some people from another continent aren’t combined correctly will almost certainly not do anything to your rating. If you lost to “zzVIP-Luis ScottVar” who had a 1500 rating but it should have been “Luis Scott-Vargas” with an 1800 rating, that would change your rating after that match by 5 or 6 points. If you played more after that incorrect entry, the six-point mistake will quickly get washed out. (To our credit, almost no curating mistakes of that sort are left in the database.)
That is our subtle way of indicating personal best ratings. Mouse over the delta to see the rating after that match. If a player does not have a green box, that means that unfortunately that player started off with a loss and never climbed back to 1500. We considered indicating personal lows with a red outline, but that seemed a little mean…
And while we’re here, a couple of other things to note about the event tables: Clicking on the name of the opponent will, not shockingly, take you to that person’s page. Less obviously, clicking on the result will take you to the head-to-head between those two players. (Usually not very exciting since most people who have ever played each other did so exactly once.) You also can mouseover the “Δ” column to see the rating as of that match, or mouseover the round number to see the format of the match.
It is true that the predecessor to the current Planeswalker Points system was a kind of Elo rating scheme, and that Wizards scrapped that system. There were several problems with that system, the worst of which was that benefits were tied to your rating. Since the rating mattered and it’s hard to keep a very high rating when each win is worth significantly less than each loss, you were incentivized not to play in order to stay at your peak. But this was a problem with how the ratings were used, not with the fact that there were ratings. So we got around this problem by having the ratings be for entertainment purposes only.
One could also complain that Magic and Elo don’t mix well because there’s variance in Magic. The Elo scheme computes an expected win percentage based on the history of the two competitors. The fact that there’s high variance means that we should expect that it will take a large sample before the trends begin to emerge, but it doesn’t make the model less applicable. Elo is used to make ratings for a variety of games like baseball and Scrabble which have plenty of variance. It also means that we shouldn’t put much stock in its predictive power: it’s true that an 1800-rated player should beat a 1701-rated player 55% of the time, but it could take a lot of matches before the win rate begins to approach 55%. Therefore using our Elo ratings as any sort of handicapping or prognosticating tool is not recommended. Again, entertainment purposes only.
Not at this time. This would be relevant for a small percentage of people: only 27% of the entries in the database have played in more than four events (enough for there to even possibly be multiple limited and constructed events). You could similarly ask about a pro rating vs. GP-only rating vs. SCG rating etc., but again the tradeoff between the overhead required to calculate and store all these different ratings for each player versus how much it would enrich the site just hasn’t been favorable.
Ugh. We felt like we had four options:
- Strike them from the database, making it so that none of the matches ever happened.
- Leave them in the database, but don’t change the ratings based on the results of those matches.
- Update the ratings based on their results, but artificially set their current rating to 1500 after they were suspended. (This leaves ratings intact after every match, so they are adjusted properly.)
- Do nothing.
#1 is just a spiteful version of #2, so that was out. We can see arguments for the other three, but ultimately opted to go for #3. The ratings are just supposed to reflect what happened, and the wins and losses did happen. From another angle, #2 would rob points from people who beat them while he had a very high rating. Ultimately, we didn’t want to be some sort of arbirter of which matches had an outcome tainted by cheating or not, so it felt best to just leave the match data as it happened. To use an imperfect sports anaology, baseball players who have admitted to steroids haven’t had home runs subtracted from their career totals. Now having said that, if we did nothing, some players who were convincted and suspended would appear at the top of the current/peak Elo leaderboards, and that didn’t really seem like a satisfying state of affairs.
We welcome suggestions about what should be done in these cases.
We weren’t expecting to see a bunch of rounds with no results, like Round 9 of GP Providence 2015 for instance. We did our best to reverse-engineer the outcome of each match based on how the standings changed from round to round.
At a slightly larger level of abstraction, we learned a lot about how the Large Event Reporter works based on all the inconsistencies in its output. More than we really cared to learn, but a lot of it was necessary to understand how to translate what appeared on the internet into what happened in the events.
(Perhaps naively?) We weren’t expecting that if you registered with VIP benefits, you’d get “ZZVIP” put next to your name in the standings, at the expense of the last few characters of your name. We also learned a lot about Spanish/Portuguese naming conventions after discovering that some GPs in Latin America tended to use both last names and some would only use the paternal surname. For recent Regional Championships Adam has had to learn things about Chinese transliterations, converting from Cyrillic to Latin, different possible spellings of Japanese names… it’s a specialized but extensive skill set.
To check for misspellings I wrote a program that looks for entries that are anagrams of each other. It has served us well, but also turned up some cool coincidences. (My favorite is Shen, Kejia anagraming to Hines, Jake.)
We are a couple of passionate Magic grinders and software/web developers: Adam, who had the idea to do this and did all the scraping and organizing of data, and Rebecca, who helped create the database and website. The project began when we were in math grad school, but by now we’ve both graduated, taken our turns as professors, and are onto the start of the next phases of our careers. As for why, there are several reasons:
- It seemed like a fun summer project. (Surely it wouldn’t take more than a summer.)
- While there are plenty of ways to try to determine who’s the best at Magic right now, the Elo system is distinct in that it weighs every match the same, whether it’s for the title or you’re 2-6 and playing out the last round of an event. We were curious to what extent the Elo leaderboard would reflect common wisdom as to who is having the most impressive season.
- To our knowledge, if you’re curious, say, how a specific player has done in the last few PTs, that information is not readily available without searching through individual coverage pages. It seemed useful just to (try to) compile that data.
- It’s fun to be able to go to an event and afterwards take a look at the strength and experience of each of your opponents.
- Rebecca tried to tell Adam at a GP once that he was better than at least half the players in the room, and instead of taking the reassuring comment at face value, Adam decided to invest weeks into rating everyone who ever played in order to quantify her statement.
We know that the data is imperfect and the rating system has its flaws, but this has been a really rewarding project. If you have any suggestions for improving the site or the data, please let us know!