Feb 3, 2010

The Freeze Ratings .. and explanation with the ratings to follow on the next post

Ok, so this post will be a little long, but I promised an explanation of the Freeze Ratings ...
So, without further ado, here is the explanation present by Geoff Freeze, the creator of the Freeze Ratings.
An FYI, I did not realize he developed the Freeze Ratings that are used in college football.



The Freeze Ratings - How They Work

Overview

The Freeze Ratings are computer ratings of sports teams based on an evaluation of comparative scores (e.g., points for and against in football and basketball, goals for and against in soccer, sets won and lost in volleyball). The Ratings are a form of decision analysis where numerical algorithms are used to make quantitative ranking decisions amongst teams.

The rating system includes options to adjust the comparative scores to consider the following factors: home team advantage, diminishing returns for large victory margins, greater weighting for more recent games, greater weighting for games against similarly rated opponents, lesser weighting for outlier/anomalous results, no "false" credit/penalty from mismatched opponents, and upset corrections (more specific details are provided below). These adjustments contribute to making the Freeze Ratings more reflective of the actual relative strengths of teams. By explicitly evaluating the comparative scores (with the aforementioned adjustments), the rating system inherently accounts for strength of schedule, although no explicit strength of schedule “factor” is calculated. A team's Freeze Rating reflects the average of the adjusted comparative scores from all of its games. It is also important to note that

The Freeze Ratings provide relative power ratings based on past performance. Because the ratings are based on scores, the difference between two team’s ratings is a direct indication of the expected score differential if those two teams were to play. Therefore, the Freeze Ratings can also be used for predicting future scores. However, because they are based on past performance, predictions might also need to factor in extenuating circumstances such as key injuries, home field, etc.

In addition to providing relative power ratings, the Freeze ratings also provide internally consistent listings of standings, points scored (both total and per-game averages), and team schedules (with scores and game-by-game ratings). The Freeze Ratings also contain methods for checking input scores for errors and anomalies.

Comparative Score Analysis

The following provides an example of comparative score analysis. Suppose Team A beats Team B 7- 0 and Team B beats Team C 21 - 6. Comparative scores (without adjustment) suggest that Team A is 7 points better than Team B (based on the actual score) and 22 points better than Team C (based on the comparative scores). As a season progresses, there are numerous games and numerous actual and comparative scores that provide comparative “links” between teams. It generally takes several games before all of the teams in the system (i.e., all Division I NCAA teams, or all high school teams in the state) are fully linked and the ratings are truly unbiased. The number of games required for full linking is dependent on the number of teams in the system; for typical systems of 100-200 teams, it typically takes 4 or 5 games. Once teams are fully linked, ratings can be compared across divisions, classes, or conferences. Note that in some cases there is never full linking – 6-man, 8-man, and 11-man football teams cannot be cross-linked, and there is a Division III football conference that only plays intra-conference games, so those teams cannot be compared to any other conferences.

The rating system also inherently rewards “actual” results more strongly than “comparative” results. In the example above, Team A is rated ahead of Team B based on the actual result. It would take a preponderance of comparative results favoring Team B over Team A (one such result would be if Team A beat Team C by say only 1 point) to overcome the actual result and move Team B ahead of Team A in the ratings. Team B could move very close to Team A in the ratings, but would have difficulty moving ahead of them (more details about this are described in “Adjustements for Upsets” below). The most common way for Team B to “pass” Team A would be for be for Team C beat Team A. Then there is no unique ordering of the 3 teams based on actual scores, each team would be 1-1 against the other two. In this case, the comparative scores then take on added importance. This situation of non-unique ordering by actual scores is quite common, whether amongst 3 teams or, more often, among a large number of teams.

Comparative score analysis is theoretically applicable to any sport where head-to-head scoring determines the outcome. Over the past several years, the Freeze Ratings have been used for the following sports:

· Football: NFL; College - all divisions of NCAA and NAIA; High school - Texas, New Mexico, Arizona, Colorado, and Utah

· Basketball: College - mens and womens NCAA division I; High School – New Mexico boys and girls; AAU - summer club teams

· Soccer: FIFA – world rankings; High School – New Mexico boys and girls

· Volleyball: High School – New Mexico

(A side note about volleyball. I wasn't sure if the limited score possibilities (3-0, 3-1, or 3-2) would provide enough differentiation among teams. But it actually works just as well as for football and basketball. It just further supports the fact that it is more important who you beat, than by how much, and that blowouts aren't a significant discriminator (in volleyball, there are no blowouts, you can't do any better than 3-0).

Specific Details of Score Adjustments

The following adjustments contribute to making the Freeze Ratings more reflective of the actual relative strengths of teams and also enhance their predictive capability. The relative importance of each of these adjustments is controlled by input parameters and is variable by sport and by skill level (i.e., college adjustments may be different than high school adjustments).

The general philosophy for making these adjustments is to maximize the value of games between similarly ranked opponents and to minimize the value of games between mismatched opponents (i.e., blowouts). The difference between a 7 point win and a 14 point win against a similar opponent is far more important than the difference between at 25 point win and a 50 point against an inferior opponent (i.e., it doesn't help your rating to run up the score).

· Home Field/Court: Comparative scores can be adjusted to consider home field/court advantage (e.g., 3-4 points for college football and basketball, less for high school sports)

· Margin of Victory (Diminishing Returns): Large margins of victory (e.g., greater than 14 points in football) are credited on a scale of diminishing returns. For example, a 38 point victory may only be worth 26 points in the ratings. This prevents a team from getting an artificially high rating by running up the score against weak opponents. Similar, small margins of victory, particularly against a closely matched opponent can provide enhanced returns, rewarding teams that are repeatedly able to win close games.

· Most Recent Games Worth More: More weight is given to the most recent games.

· Strength of Opponent: More weight is given to games played against an opponent with a similar Freeze Rating. The relative strengths of teams can be better determined by comparing their performance in games against similarly matched opponents than by comparing their performance in blowouts (e.g., pressing basketball teams or a passing football teams will typically run up large blowout margins against weaker teams that can overestimate their true strength when matched against a stronger opponent). Again, this prevents a team's rating from being skewed too much by blowout victories or losses.

· "Outlier" Results Worth Less: In cases where a team has 1 or 2 results that are significantly different from the rest of its results, these anomalous or "outlier" results are given less weight. Typically, these outlier results are from games where one of the teams had either positive or negative extenuating circumstances (e.g., key injuries) and don't properly reflect the relative strengths of the two teams. It might also result from an erroneous input score. (Erroneous input scores, such as a typo in a score or a JV team being listed as a varsity team, are minimized in the Freeze Ratings by detailed checking of input scores, both manually and by the computer. However, this is a particular problem in some of the online systems, such as Maxpreps, where a single erroneous score can skew the ratings, especially if it is associated with a key “linking” game).

· No False Credit/Penalty from Mismatched Opponents: In the course of a season there are games where two extremely mismatched opponents play. In these cases, the difference in ratings may be higher than a reasonable margin of victory (especially in high school with running clock provisions in blowouts). For example, a basketball team may be a 50-point favorite, but only win the game by 30 points. In this case, there should be no penalty to the favored team (i.e., without adjustment, they performed 20 points below their average), nor any corresponding bonus to the underdog (i.e., 20 points above their average). The no false credit adjustment identifies these results and eliminates from the calculation of a team’s average rating.

· Adjustments for Upsets: In the course of a season there are always games where a team that is weaker over the entire season wins a single game against a stronger opponent. The Freeze Rating approach identifies these games and partially reverses the comparative score credit.

· Preseason Ratings: Early in the season (i.e., before all teams are “linked”), the Freeze Ratings are more realistic when based partially on comparative scores and partially on preseason power ratings. Further into the season, as the number of comparative scores increases, the weight given to a team's preseason rating decreases. By about the 4th or 5th game, a team's Freeze Rating is based solely on the average of all of its comparative scores, and is therefore completely unbiased.

About the Developer

The Freeze Ratings were developed by and are compiled by Geoff Freeze of Albuquerque, NM. Geoff is a civil / environmental engineer with degrees from the University of British Columbia and Texas A&M University. His expertise is in the application of computer models and decision analysis techniques to environmental problems, specifically groundwater contamination and nuclear waste disposal. Geoff is a former college basketball player, now married with two children. He stays active in recreational sports and through coaching and officiating youth sports.

The Freeze Ratings were first produced in the late 1980’s. They have been featured on local television and radio programs in Albuquerque and have been published in the Dallas Morning News, the Albuquerque Journal, and on the Internet. The Freeze Ratings always agree well with various polls and can predict the winner of a game about 85% of the time. In one online comparison (for college football), the Freeze Ratings were found to be the best (of over 50 different systems, including the BCS systems) in terms of matching/beating the Las Vegas spread. In comparisons with NM high school state tournament seeding, the Freeze Ratings correctly predict “upsets” by lower seeds over higher seeds more than 50% of the time.


No comments:

Post a Comment