Monte Carlo Analysis
During the tournament, the site will present some data for how it thinks the contest will turn out based on a Monte Carlo simulation. Whenever results are updated, a background process starts running that simulates tournament outcomes and calculates the following values per entry:
- Minimum/maximum score
- Minimum/maximum place (Note: this might not be from the same outcome as the min/max score.)
- Average score
- Standard deviation of scores
- How many sample tournament outcomes were used
Game Outcome Probabilities
The probability of a game outcome during the analysis was derived from the historical outcomes of the first two rounds of the tournament.
If there is a set of historical data for a given matchup, the simulation will use that probability directly. In order for historical matchups to be used, a minimum number of games (currently 25) must have been played. So, for example, even though historically 12-seed vs 13-seed matchups have gone 8-2 (through 2012), this is not considered a large enough sample to be used directly. Even with this, there are still a number of outliers: see, for example, the fact that 1-seeds tend to beat 9-seeds way more than would be expected by the seed difference.
As shown in the graph above, the relationship between seed difference and winning percentage is roughly linear. So, if there are not enough matchups to use a direct historical probability, the following formula is used:
HighSeedWinsProbability = [(LowSeed - HighSeed)/30 + 0.5]
This works out so 1-16 matchups (difference of 15) ends up at 100% in favor of the 1-seed, and same-seed matchups end up at 50%. It doesn't exactly agree with the linear regression of the historical matchup data, but it's close enough.