Methods

Identifying Who Mains What

Let's define this a little bit better: We want to find who mains a champion, or could potentially be a good player of a champion. This summoner is what we will call a "Main".
There is no clear-cut universal definition of when you would say someone mains Veigar. Maybe he played 500 games on it. Maybe he consistently dealt a ton of damage during every match he played. These criteria, however, might not translate across champions well. You won't expect Lulu to be dealing the same damage to champions as if she were Viktor. In order to define a reasonable metric to evaluate the performance of a summoner, we looked at the following numbers on a specific champion, adjusting how we weigh them on a champion-by-champion basis:
  • Number of games played;
  • Win rate;
  • Play rate compared to other champions;
  • Number of Kills;
  • Number of Assists; and
  • KDA.
After all of these numbers are loaded, we perform principle component analysis on them (Wikipedia:PCA). Here is one that is done for Nidalee:
Each point is an analyzed set of data from one summoner. Close to the three corners of the spread, you should see dots indicating the players with the highest KDA, most number of games played on Nidalee, and that poor guy with the lowest KDA. It is interesting to note the summoner with the highest KDA is not at the very edge of the blob, due to how the details go for our PCA and is complete normal. Empirically principle component 1 (PC1) appears to be sufficient to identify who is good on this champion. Therefore, those on the upper echelon will be considered as a main. Oh and yes, all of them are platinum+ players.

Picking Up and Dropping Champions

This is a bit more complicated than extracting the other stats, which are directly accessible from API raw data (Ok I get it, some do take doing a division, so that technically doesn't count as direct). Again this comes down to questioning how we quantify "picking up" and "dropping". Apparently PCA is not a good choice to analyze this, so we let's direct ourselves to artificial neural networks.
A simple thing to look at is the number of games a summoner played on a champion during time windows before and after the patch change. In particular we looked at 3 patches before and after. The following plot shows the number of games played during patch 5.13-5.15 (Weight 2) versus that of patch 5.10-5.12 (Weight 1).
Overlaying on the data points is a trained self-organizing map. This is generated using the Matlab Neural Net Clustering add-on. We can use this result to identify 3 regions:
  • Green: Most possibly picked up the champion;
  • Red: Most probably dropped a champion; and
  • Blue: Unaffected, hard to tell, maybe, etc. For simplicity let's just call this unchanged.
In addition a summoner has to have at least 3 games before/after the patch in order to make our list.