The R-Value

The points you gave me, nothing else can save me, SOS

Advertisements

Several of my posts have referenced the “R-value”. I think most people realize it is some sort of statistical measure of a team’s strength, but they are confused by either its derivation or interpretation. I am long overdue on clarifying this.

Primarily, the R-value is a mechanism to rank teams who all played the same questions, but did not necessarily play each other. The two most useful applications for this are the Ontario regional-to-provincial and the Ontario provincial prelim-to-playoff qualification systems. Both have a large number of teams that need to be condensed to a small fraction of top teams that would proceed to a higher level, and they all played (roughly) the same questions.

A mechanism exists for this purpose in the US. National Academic Quiz Tournaments’ college program has a couple hundred university teams compete in regional tournaments, all vying to qualify for 64 spots in their national championship (across two divisions). The regional tournaments are all played on the same set of questions. Originally, NAQT used an undisclosed “S-value” to statistically determine which teams, beyond regional winners, deserved a spot in the national championship. With the cooperation of regional hosts providing stats promptly, NAQT could quickly analyze the results and issue qualification invitations a few days after the regional tournaments. Prior to the 2010 season, Dwight Wynne proposed a modified formula made transparent so all teams could verify their values were correct. NAQT adopted this, and named the mechanism the “D-value” in honour of Dwight. In 2015, the Academic Competition Federation introduced their “A-value” for national qualifications, which largely followed the D-value formula.

The R-value is a D-value modified for SchoolReach. The “R” stands for “Reach” or “Reach for the Top”. SchoolReach results typically lack the detailed answer conversion information available in quizbowl, so the R-value is dependent on total points and strength of schedule. I also added 2 modifications that I will get to later.

The R-value asks: “How does a team compare to a theoretical average team playing on the question set?” It is answered in the form of a percentage; if a team has an R-value of 100%, they were statistically average for the field. A step-by-step process to get there:

Note: my primitive embedding of LaTeX in WordPress is used below. It is possible it may not appear in your browser.

  • First, calculate all teams’ round-robin points-per-game (RRPPG). All games which occur in a round-robin system are included, even if a team plays another team multiple times. Playoffs, tiebreaking games, and exhibition matches are excluded. If certain games are known to be “extended” (for example, double-length), that is reflected in the “RR games” total.
  • RRPPG=\frac{RRPts}{RRG}
  • With the RRPPGs known, determine each team’s round-robin opponent average PPG (RROppPPG). This is the average of the PPGs of each opponent a team played, double- or triple- counting where appropriate if they faced each other multiple times. Note: this is different from a team’s average points against, which is a different statistic that is not used in this analysis.
  • RROppPPG=\frac{RRPPG_{opp_1} +RRPPG_{opp_2} +...+RRPPG_{opp_n}}{RR games}
  • The question set’s average points is also needed. This covers all pools and all sites where the questions were used for the purpose of the rank. I determine this average through total RR points and total RR games, so larger sites that have more games do end up with a larger influence on the set average.
  • SetPPG=\frac{\sum{RRPts}}{\sum{RRG}}
  • Strength of schedule (SOS) is a factor to determine how strong a team’s opponents were compared to facing an average set of opponents for the field. A value above 1 indicates a tougher than average schedule; below 1 is a lower than average schedule. In reasonably balanced pools, it is typical to have top teams below 1 and bottom teams above 1 – a top team doesn’t play itself, but its high point tally contributes to the total of one of its weaker opponents. Also, by comparing across multiple pools/sites, SOS can give an overview of how strong a pool/site was.
  • SOS=\frac{RROppPPG}{SetPPG}
  • Now for the biggest leap: the points a team earned must be modified to account for how strong its schedule was. Racking up 400 PPG is far more difficult against national contenders than against novices. Adjusted RRPPG multiplies points by the SOS factor – a tougher schedule gives a team a higher adjusted point total. This adjusted value theoretically represents a team’s PPG if they faced a slate of average teams. Note: this value is not shown in result tables.
  • RRPPG_{adj}=RRPPG \times SOS
  • This value is suitable on its own for ranking. However, I add an extra step of normalizing for the set, so I can compare across years. Earning 400 PPG is far more difficult when the set average is 200 compared to a set average of 300. For example, the late ’90s/early ’00s had much higher set point totals than today (through different formats), and a normalization is needed to compare historical teams of that era to today. The calculated result is the raw R-value, which I convert to a percentage for easier comprehension of how much different from average a team is.
  • Rval_{raw}=\frac{RRPPG_{adj}}{SetPPG} \times 100\%

Raw R-value is the number I use for most comparison purposes. In earlier posts, I tried to show some examples of how this statistic is useful for predicting future performance (especially playoffs) and analyzing outlier results. If R-value is to be used for any sort of qualification system, however, it needs to account for the universally-accepted idea that it is most important to win games. Almost all tournaments use final ranks based primarily on winning (either in playoffs or just prelim results). A team with a low (raw) R-value that finishes ahead of a team with a high R-value deserves qualification just as much (if not more than) teams below them in the standings. The actual R-value is then calculated, based on NAQT’s system (quoting from their D-value page):

After the raw values are computed, they are listed in order for each [site] and a correction is applied to ensure that invitations do not break the order-of-finish at [a site]. Starting at the top of each [site], each team is checked to see if it finished above one or more teams with higher D-values. If it did, then that team and every team between it and the lowest team with a higher D-value are given the mean D-value of that group and ranked in order by their finish.

Let’s say a site winner had a raw R-value of 120% and the runner-up had a final upset while finishing with a raw R-value of 140%. Under this adjustment, both teams end up with the mean, 130%, for their true R-value. The winner receives a boost for finishing above one or more stronger teams, while the lower teams receive a penalty for not reaching their “potential”. The true R-values would then be compared across pools/sites for qualification purposes; if tied teams straddle the cutoff for qualification, invites are issued in order of rank at the tournament.

I do deviate slightly from this formula, though. It is possible, but rare, for the top-ranked team in this average to end up with a lower R-value for finishing higher than a stronger team (e.g: 1st 120%, 2nd 80%, 3rd 130%; all teams get 110%). I don’t believe this should ever happen. If it does, I modify the averaging by this algorithm:

  • First, follow the NAQT algorithm
  • If the first team in the averaging has their R-value lower than their raw R-value, ignore the last team (which has a higher raw R-value than the first team)
  • Proceed to the team one rank above the formerly-last team and attempt the R-value average again. Repeat until the first team improves upon their R-value.
  • Continue the NAQT algorithm with the next team after the new set of averaged teams

Look at the 2016 Ontario Provincials results for an example. Woburn had a very high raw R-value (131.8%), but finished very low (22nd). Under the basic D-value algorithm, 4th-placed London Central would have joined the big set of teams all the way down to Woburn, and ended up with a decrease in their R-value, thanks to the many intermediary teams with low raw R-values. Instead, Woburn was ignored, and the next-lowest team with a higher raw R-value (Hillfield at 132.9%) was tested. Again, this would drop Central’s R-value because of the low value for intermediary Marc Garneau. It is only an average with 5th-placed Waterloo that allows Central to improve on their raw result. From this, the algorithm goes to the next “unaveraged” team, Marc Garneau, who starts the group all the way down to Woburn because they earn a slight R-value boost. 6th through 22nd end up with a final R-value of 110.6% each.

And that’s how you get the R-value. The math isn’t that complicated, but it does require detailed number-crunching, especially for the opponent PPG step. Until more thorough result reporting occurs in SchoolReach, it is probably the best analysis that can be done with the information available. Thankfully, it is a fairly reliable metric for team performance, and I hope to show some examples in future posts.

Shootout theory

Boy, that title can be taken out of context.

In the 15 or so years of “shootouts” in Schoolreach, they have been the most captivating part of a match. Over the course of a blitz of questions, teams must demonstrate that they have depth of knowledge among all four players, as correct answers slowly whittle down the field until all the pressure rests on the final teammate to earn the 40 points. It’s nail-biting, it’s a big swing of points, it’s…

…the least important stretch of a game.

Yes, I will argue that the shootout is insignificant to the point of irrelevancy for a good team. In fact, it can be a statistical annoyance in the context of a whole tournament. It just requires a different mindset.

The shootout offers 0 or 40 points over 12 questions. Let us assume that a match featuring at least one good team will see the 40 points attained, and not let all that buzzing go to waste. The shootout thus offers 3.3 potential points per question (PPPQ). Compared to other types of questions:

  • List question: 50 PPPQ
  • “What-am-I?”: 40 PPPQ
  • 20-point special: 20 PPPQ
  • Team scramble: 10 PPPQ, but an effective 40 if all the “potential” is dependent on the first part
  • Snappers/open questions: 10 PPPQ
  • Assigned questions: 10 PPPQ, but depends on opponent being incorrect every time
  • Relay: 6.25 PPPQ (other half of relay is unavailable to one side)
  • Shootout: 3.33 PPPQ

But why is this relevant? Shouldn’t 40 points from a scramble/bonus group or a “what-am-I?” be the same as 40 points from a shootout? Yes, it’s still 40 points, but it is an extremely inefficient source of points on which to focus. A subsequent span of 12 open questions can easily net you enough points to recover from any shootout loss, and a correct team scramble opener gives you all the shootout potential with one buzz.

Point efficiency arises from the fact that there is a limit of 80-90 questions in a game. Earning points is not only critical for winning (obviously), but also for improving your position in tournament standings, through seedings and tiebreaks. In fact, the mere existence of a shootout can have more impact on your standing than the outcome of it! See below:

You are a reasonably good team that can get an average of >300 points per game (>1/3 of available points) in a tournament that has a bye round. Your reasonably good rival in the standings had a bye when a shootout appeared. Your bye did not have a shootout, and the 12 questions were filled with a mixture of 10 PPPQ formats (assigned, open, etc). With the 12 filler questions, your opponent would earn more than 1/3 of the points on average, which is more than the 40 points you would gain from winning your shootout.

When I ran regional tournaments, I reviewed the sets in advance to determine the potential games in a match, and normalized scores to even the field that could be affected by byes. As far as I can tell, no one else in the history of SchoolReach has done this, and standings are just based on actual points. If every game consistently had exactly one shootout per game, this would be less concerning, but it is not the case.

I hope I have demonstrated that, in theory, shootouts are not worth their perceived importance. Unfortunately, the issue of morale remains. Shootouts are inherently set up to be a momentum swing that can start an underdog comeback or solidify a lead. It’s also a gimmick to give a greater chance of upsets, since upsets are usually more likely to occur when fewer total points are available. The best thing a good team can do is find a “mental zone” to ignore any effects of a shootout, good or bad. A good team should know that both a win and a loss are insignificant compared to a good buzz on a “what-am-I?” or team scramble, and that a stretch of 12 open questions has more impact than all the time spent on a shootout. Of course you should still attempt a shootout, but don’t fret over it…

…Worry about the “what-am-I”. But that’s another story.