The points you gave me, nothing else can save me, SOS
Several of my posts have referenced the “R-value”. I think most people realize it is some sort of statistical measure of a team’s strength, but they are confused by either its derivation or interpretation. I am long overdue on clarifying this.
Primarily, the R-value is a mechanism to rank teams who all played the same questions, but did not necessarily play each other. The two most useful applications for this are the Ontario regional-to-provincial and the Ontario provincial prelim-to-playoff qualification systems. Both have a large number of teams that need to be condensed to a small fraction of top teams that would proceed to a higher level, and they all played (roughly) the same questions.
A mechanism exists for this purpose in the US. National Academic Quiz Tournaments’ college program has a couple hundred university teams compete in regional tournaments, all vying to qualify for 64 spots in their national championship (across two divisions). The regional tournaments are all played on the same set of questions. Originally, NAQT used an undisclosed “S-value” to statistically determine which teams, beyond regional winners, deserved a spot in the national championship. With the cooperation of regional hosts providing stats promptly, NAQT could quickly analyze the results and issue qualification invitations a few days after the regional tournaments. Prior to the 2010 season, Dwight Wynne proposed a modified formula made transparent so all teams could verify their values were correct. NAQT adopted this, and named the mechanism the “D-value” in honour of Dwight. In 2015, the Academic Competition Federation introduced their “A-value” for national qualifications, which largely followed the D-value formula.
The R-value is a D-value modified for SchoolReach. The “R” stands for “Reach” or “Reach for the Top”. SchoolReach results typically lack the detailed answer conversion information available in quizbowl, so the R-value is dependent on total points and strength of schedule. I also added 2 modifications that I will get to later.
The R-value asks: “How does a team compare to a theoretical average team playing on the question set?” It is answered in the form of a percentage; if a team has an R-value of 100%, they were statistically average for the field. A step-by-step process to get there:
Note: my primitive embedding of LaTeX in WordPress is used below. It is possible it may not appear in your browser.
- First, calculate all teams’ round-robin points-per-game (RRPPG). All games which occur in a round-robin system are included, even if a team plays another team multiple times. Playoffs, tiebreaking games, and exhibition matches are excluded. If certain games are known to be “extended” (for example, double-length), that is reflected in the “RR games” total.
- With the RRPPGs known, determine each team’s round-robin opponent average PPG (RROppPPG). This is the average of the PPGs of each opponent a team played, double- or triple- counting where appropriate if they faced each other multiple times. Note: this is different from a team’s average points against, which is a different statistic that is not used in this analysis.
- The question set’s average points is also needed. This covers all pools and all sites where the questions were used for the purpose of the rank. I determine this average through total RR points and total RR games, so larger sites that have more games do end up with a larger influence on the set average.
- Strength of schedule (SOS) is a factor to determine how strong a team’s opponents were compared to facing an average set of opponents for the field. A value above 1 indicates a tougher than average schedule; below 1 is a lower than average schedule. In reasonably balanced pools, it is typical to have top teams below 1 and bottom teams above 1 – a top team doesn’t play itself, but its high point tally contributes to the total of one of its weaker opponents. Also, by comparing across multiple pools/sites, SOS can give an overview of how strong a pool/site was.
- Now for the biggest leap: the points a team earned must be modified to account for how strong its schedule was. Racking up 400 PPG is far more difficult against national contenders than against novices. Adjusted RRPPG multiplies points by the SOS factor – a tougher schedule gives a team a higher adjusted point total. This adjusted value theoretically represents a team’s PPG if they faced a slate of average teams. Note: this value is not shown in result tables.
- This value is suitable on its own for ranking. However, I add an extra step of normalizing for the set, so I can compare across years. Earning 400 PPG is far more difficult when the set average is 200 compared to a set average of 300. For example, the late ’90s/early ’00s had much higher set point totals than today (through different formats), and a normalization is needed to compare historical teams of that era to today. The calculated result is the raw R-value, which I convert to a percentage for easier comprehension of how much different from average a team is.
Raw R-value is the number I use for most comparison purposes. In earlier posts, I tried to show some examples of how this statistic is useful for predicting future performance (especially playoffs) and analyzing outlier results. If R-value is to be used for any sort of qualification system, however, it needs to account for the universally-accepted idea that it is most important to win games. Almost all tournaments use final ranks based primarily on winning (either in playoffs or just prelim results). A team with a low (raw) R-value that finishes ahead of a team with a high R-value deserves qualification just as much (if not more than) teams below them in the standings. The actual R-value is then calculated, based on NAQT’s system (quoting from their D-value page):
After the raw values are computed, they are listed in order for each [site] and a correction is applied to ensure that invitations do not break the order-of-finish at [a site]. Starting at the top of each [site], each team is checked to see if it finished above one or more teams with higher D-values. If it did, then that team and every team between it and the lowest team with a higher D-value are given the mean D-value of that group and ranked in order by their finish.
Let’s say a site winner had a raw R-value of 120% and the runner-up had a final upset while finishing with a raw R-value of 140%. Under this adjustment, both teams end up with the mean, 130%, for their true R-value. The winner receives a boost for finishing above one or more stronger teams, while the lower teams receive a penalty for not reaching their “potential”. The true R-values would then be compared across pools/sites for qualification purposes; if tied teams straddle the cutoff for qualification, invites are issued in order of rank at the tournament.
I do deviate slightly from this formula, though. It is possible, but rare, for the top-ranked team in this average to end up with a lower R-value for finishing higher than a stronger team (e.g: 1st 120%, 2nd 80%, 3rd 130%; all teams get 110%). I don’t believe this should ever happen. If it does, I modify the averaging by this algorithm:
- First, follow the NAQT algorithm
- If the first team in the averaging has their R-value lower than their raw R-value, ignore the last team (which has a higher raw R-value than the first team)
- Proceed to the team one rank above the formerly-last team and attempt the R-value average again. Repeat until the first team improves upon their R-value.
- Continue the NAQT algorithm with the next team after the new set of averaged teams
Look at the 2016 Ontario Provincials results for an example. Woburn had a very high raw R-value (131.8%), but finished very low (22nd). Under the basic D-value algorithm, 4th-placed London Central would have joined the big set of teams all the way down to Woburn, and ended up with a decrease in their R-value, thanks to the many intermediary teams with low raw R-values. Instead, Woburn was ignored, and the next-lowest team with a higher raw R-value (Hillfield at 132.9%) was tested. Again, this would drop Central’s R-value because of the low value for intermediary Marc Garneau. It is only an average with 5th-placed Waterloo that allows Central to improve on their raw result. From this, the algorithm goes to the next “unaveraged” team, Marc Garneau, who starts the group all the way down to Woburn because they earn a slight R-value boost. 6th through 22nd end up with a final R-value of 110.6% each.
And that’s how you get the R-value. The math isn’t that complicated, but it does require detailed number-crunching, especially for the opponent PPG step. Until more thorough result reporting occurs in SchoolReach, it is probably the best analysis that can be done with the information available. Thankfully, it is a fairly reliable metric for team performance, and I hope to show some examples in future posts.