What do I do when I'm bored? Other than play Street Fighter II on my Super Nintendo emulator (your nerd jokes can't hurt me, I'm proud of it), I think about the Horns, the BCS, and football in general. I read what I can, write a few things, and even re-watch games, much to the detriment of my fall studies. Because I retain the status as a contributing author, I will subject the community to more of my thoughts before PB pulls the plug on my ramblings.
Speaking of Street Fighter II, I must say that it is pretty satisfying to whip the opponent without getting touched to earn a "Perfect" from the announcer (or a "flawless victory" from you Mortal Kombat losers). "Perfect" means you totally destroyed the other fighter, and your reward is a mountain of bonus points. In fact, the more health you have and the more time you have left when you win a round, the more total points you get. Thus, if a player were to go undefeated in all 12 matches and win quickly and impressively, he can expect to be high in the score rankings, as opposed to some lucky bozo who had to go to Round 3 in every match and squeak by while getting bloodied up. It doesn't matter if Player 2 was undefeated too; Player 1 won with dominance, and therefore, he has a better score and is the better Street Fighter player. Sounds perfectly logical, right?
In yo face, Guile.
But wait, says Player 2. He played on Hard difficulty, while Player 1 played on Medium. The fact that Player 1 dominated Medium difficulty is much less impressive than him navigating Hard difficulty without a loss. As they argue, a third player enters and says triumphantly that while he lost one close match, the fact that he played on Hardest difficulty and finished the game means he's the superior player. Player 2 balks at this and says that there is no material difference between Hard and Hardest, while Player 1 shouts that it doesn't matter what difficulty he played on: He lost! The three players bicker endlessly before bedtime, at which time they quietly wonder why they don't have girlfriends.
Where am I going with this? Well, I'm sure I don't need to tell you that this hypothetical nerd debate sounds frighteningly similar to common BCS arguments that rage among fans and pundits. Arguably, no other topic is as controversial as the dreaded phrase "style points," an ambiguous term that is related to strength of schedule, margin of victory, offensive statistics, and almost anything else fans want to include. College football fans will generally agree that teams need to play well against both good and bad competition to warrant consideration among the nation's best, but they disagree mightily on what constitutes "playing well." The addition of "objective" computers really hasn't helped ease these debates, and they've arguably made them worse.
Among the many factors people consider, I want to focus on the issue of margin of victory (although that naturally is tied to strength of schedule), a controversial metric used to attempt to separate teams high in the rankings of the BCS. I'll discuss the utility of margin of victory (henceforth MoV), its flaws, and then I'll propose some tentative, general ideas on how it can possibly be added back into the computers in reasonable way. This is just a thought exercise, as I do not expect any such changes to the BCS, but it is something fun to think about.
The (Un)Reliability of Margin of Victory
MoV is one of those factors that seem easy to interpret on the outset, which is why many analysts love to appeal to it. It's just basic arithmetic, which is "objective," correct? Uh, not quite. While almost any voter or power poller considers MoV, how one applies it to his rankings may differ widely. What looks like a simple consideration turns into yet another ambiguous statistic upon closer inspection. There are probably three main reasons for this:
- Strength of Schedule - The most obvious is the strength of schedule. Beating UTEP by 50 is not the same as beating USC by 50. I think most would agree with that. However, is beating Toledo by 45 the same as beating Texas A&M by 21? Is thrashing Ohio State, say, 56-0 the same as beating Florida 10-7? Now that's where things get really tricky.
- Running Up the Score/Garbage Points: Normally, when we look at a three score victory in the box score, we naturally assume an easy win for the victor. However, this is not always true; Iowa's recent, 18-point victory over Indiana is one such game where the final score does not do justice to the competitiveness of the entire game. Similarly, if we see a 10 point win, we might think it was a close game, but it may be that the winning team was up by 24 the entire game and only two, meaningless touchdowns on the last two drives of the loser made the score more respectable.
- Difference in Scoring Pace: While the MoV is equal, a game that is 21-0 is not quite the same as a game that is 56-35. The first game features a shutout, which some voters love, while the second features tons of points, which other voters love. There is, at least, some sort of difference between these two games.
These reasons are important to discuss because they not only can make the MoV ambiguous, they can make it downright misleading. If a team is trying to look better than how they actually did, they might run up the score for some meaningless field goals or touchdowns to fool those who just look at the final score (and computers would just look at the final score). Ditto for teams who are way behind and try to make the score more respectable with garbage points. In Ole Miss' first game, for example, Houston Nutt intentionally ran up the score to hide the fact that the Rebels were in a dogfight against not-so-mighty Memphis for much of the game. Similarly, if a team wins 56-0, it looks nice even though the game was played against a sorry FCS opponent. If the hypothetical 21-0 and 56-35 games are looked at as equal, it does not do justice to the defense of that first game that earned the shutout or perhaps the offense in the second game for being so prolific. As you can see, while many, like Dr. Saturday, think removing MoV from the computers was a stupid idea, it is easy to see the concerns about having computers, which do not watch games, using such a potentially misleading statistic. Even if you don't care a lick about the potential sportsmanship ramifications, these are powerful arguments against using MoV.
However, most viewers understand that there is at least something to be gleaned from a big win. Elite teams will bury inferior competition and even whip good competition from time to time. Most Texas fans will say that thrashing Oklahoma State in Stillwater 41-14 earned Texas some serious respect. Without a doubt, Oregon fans (and by extension, Boise State fans) will make a big deal out of the Ducks not only for defeating USC, but for hammering them by 27. Likewise, most people (and even some Iowa fans) will say that Iowa barely surviving against Northern Iowa is a legitimate point to bring up, as is their close game with Arkansas State. This is simply an intuitive grasp of what happened in those games, and many people argue that this aspect needs to be reflected in the computers in one way or the other. In other words, the computers should not just take into account wins and losses, but how the teams actually looked on the field, which is at least partially seen in MoV. Just as not all wins and losses are equal based on SoS, neither are all wins or losses equal based on MoV.
Personally, I was at first in favor of eliminating MoV from the computers; I felt that it was much too misleading a factor for the computers. If MoV is to mean anything, I felt that the voters, who make up 2/3 of the BCS formula, would be enough to take MoV into account. Unfortunately, the voters do as poor or poorer of a job processing MoV than a wooden computer might, many of whom fail to closely watch games and simply rely on box score stats. Since we're stuck with the computers for the foreseeable future (much to whills' displeasure), I have thought of some ways to potentially add MoV back into the computers while minimizing the chance of deceptive scores skewing the results. I'm not a programmer so I'm not even going to pretend to know how to actually program these into the computers; these are just general concepts that could be useful.
Cap the MoV - I think most would agree that winning by 35 and winning by 45 is not materially different. This is a very simple thing to do that would be a good start, but it definitely would not prevent teams from running up the score to reach that number or trying desperately to score garbage points to get under the threshold.
Track "Close Game" Stats - At Rock M Nation, Bill C. uses categories called "close game %" and "close game stats." He defines a close game as one that is within two scores, and "close game %" tracks what percent of the entire game was within that margin (if a game was within two scores for only 15 minutes, the game was only 25% close), while "close game stats" are the actual offensive and defensive statistics he tracks within that time frame. This can be very helpful in discarding what are "garbage" points and statistics and what are numbers that actually made an impact on the game. For example, when discussing the Missouri-Texas game last year, Bill C. makes this observation:
So what happens if you're, say, Carl Edwards, and you have a mishap on the track in Lap 1 of a race? Your car is able to keep going, but you end up laps behind. You try to drive well the rest of the race, but it just doesn't matter because your day ended in the first minute, and the next three hours are just a formality. This game is why my 'close game' designation was created. While the overall stats tend to say something, the 'close game' stats are what count. The overall stats say this was an offensive shootout, with Texas just having more success in the end. The 'close game' stats say this was never a game, not even for one second. (emphasis mine)
And regarding close game %:
This game was also why I do the "% close" number. Penn State ended up beating Michigan by a bigger margin than Texas did over Missouri. But PSU-Mich was 80.0% close (i.e. 80% of the plays took place while the game was within two possessions). UT-MU wasn't even 30% close.
We might quibble what constitutes a "close game" or not, but I think the use of this concept could go a long way in weeding out the meaningless points that teams can put up. Taking this and the MoV cap together, if you and your opponent are within one score all game but you run up the score at the end to hit a 21 point lead, the close game % would basically neutralize those points and compute the game as it really was: A close game. In a similar vein, if you rush out to a 28 point lead and coast, and the opposing team scores two touchdowns in the waning moments of the fourth quarter against your backups, the close game stats would still register a blowout.
Differentiate MoV among different competition: I am unsure if the computers already did this before; in any case, it seems obvious to reward blowing out a good team more so than blowing out a terrible team. Beating an FCS opponent by the threshold of 35 (or whatever it is deemed to be) would be pretty negligible; dominating another Top Ten team, however, could earn a team some decent points.
Track Offensive and Defensive Advanced Statistics: Again, I will draw from Bill C.'s excellent Beyond the Box Score methods. One good way to properly reward and punish teams for on-field performance is to track efficiency, explosiveness, etc., more so than just bare statistics. Thus, while a team might have a large MoV, they may still get punished for playing poor defense or bad offense. This is a good way to treat different games with the same MoV in a more accurate manner. This, of course, is also checked by the close game stats, which would minimize the impact of stats gained in garbage time.
Perhaps whills is right that the best change the BCS could make right now is to eliminate the computers and expand the voter base. It's not going to happen, so if we're stuck with these machines, we might as well attempt to closer simulate the act of watching games by using advanced metrics along with MoV to help them differentiate better between games, and therefore, teams. Perhaps, then, the computers can actually help us answer questions such as, "Did Boise State dominate Oregon, or did both teams just suck it up and Oregon sucked worse?", "How good are Iowa's and Cincy's wins?", and, perhaps even retroactively, "So... who really deserved to win the Big 12 tiebreaker last year?" In any case, I know this much; MoV matters, but it is so susceptible to being skewed that I just can't endorse applying it robotically, either by a computer or by a human. If it's going to be in the computers, it needs some serious checks.