Go Back to Your Shanties, People
When PB passed me a link to this story, I read the headline and thought: "Oh, one of many undefeated teams in college football is listed as #1 by a computer after two data points. Ho hum." After reading the "journalism" attached to that headline however, I feel that a bit of a response is warranted - If only to put up something for the reasonable among us to point to when the dentally-impaired among us begin their standard refrain of "LOUD NOISES" at the appearance of any potential ammunition against the BCS.
One of the big problems with public debate and opinion is that some very vocal participants are unwilling or unable to realize that an argument's objective soundness is not a function of its support or refutal for their side of the argument (i.e. There may exist stupid arguments that support your opinion and smart ones that refute it). One side of that failure (not listening to smart arguments against you) is exactly the reason we have "issues" instead of "solutions" in the majority of popular topics in the media from the serious (politics, religion, football, etc.) to the trivial (politics, religion, football, etc.). The media is no better than the information it carries, and has grown into a machine for fanning the flames in these futile debates with stupid arguments in favor of one side or another, and the tone of this article and the careful choice of facts within it is a perfect example. This kind of journalism is like the fire department gifting free oil lanterns to the night shift at a grain elevator; they're just looking for work.
I see some of us have already indicated the appropriate response to this story, but I'd like to take this opportunity to point out what's really going on with the Colley Matrix. This is not an indictment of the Colley Matrix or any other BCS computer poll (this is). In fact, I would say that the Colley Matrix is one of my favorite computer polls for a number of reasons. In fact in fact, I will be saying that right after the jump, so put away your pitchforks, douse your torches and click ahead.
The Colley Matrix is really quite an elegant way of providing a ranking. Wes Colley is kind enough to supply a thorough description of his method on his website, which anyone with a vaguely mathematical degree in college should be able to understand at least conceptually. I'll summarize here, but I encourage everyone to check it out because it's a really clever way of creating a ranking scheme that's totally objective.
In kind of a hand-waving description, the Colley Matrix method is to rate all of the teams based on winning percentage, then adjust those ratings based on the ratings of the teams they played (their strength of schedule). Then take the new ratings and adjust them again based on the new strength of schedule derived from the adjusted ratings of opponents. Then iterate the process until all of the ratings stabilize and quit reacting to successive iterations. The result is that a team's rating starts off somewhere, then is adjusted by smaller and smaller amounts until it zeroes in on its final value. Colley runs these upwards of 60 iterations to narrow the errors in the calculation of the ratings down to one part in ten million, so you don't have to worry about computational errors here. This whole process can be alternately formulated as the solution to a system of N linear equations, where N is the number of teams being rated, which can be solved quickly by certain well-developed methods in the field of linear algebra (it's worth noting here that "quickly" is probably measured in microseconds. I love technology!).
So, if you'll notice, the only information used to create a team's ratings are the win-loss records of every team - No ad hoc adjustments for home/away, yardage, points scored, points given up or anything else. With the Colley Matrix, you are guaranteed perfect objectivity; the method doesn't care how you win your games so it can't promote one style or another of football. Also, you should expect a system like this to give funny results early in the year, since there is so little information available to it after two games, and the system doesn't use the previous season as a seed or starting point. Ergo, Texas at #1 simply means that the intricacies of the interactions between wins and losses all across the country make the Longhorns' two wins look better than anyone else's after two games. No controversy, no indication of bias or conspiracy, go back to your shanties, people; there is no monster on the loose - just a bear raiding your dumpster.
The Colley Matrix digging to the bottom of the information barrel
-OR-
Me watching December bowl games
The Colley Matrix is perfectly objective, but there is room to argue that it isn't perfect. It doesn't matter, for example, which teams in your schedule you lose or win to, only your overall record and how tough your schedule was. Imagine if Texas played 10 average teams, 1 elite team, and 1 horrible team, going 11-1 overall. That's the information that the Colley Matrix sees. Does it matter if I tell you they lost to the elite team? The horrible one? An average one? Colley stays out of this conundrum because he sees any differentiating between wins as ad hoc. However you justify it, you're introducing opinion if you change your mind based on which team the loss came from. You could say it's Colley's opinion that nothing is objective except wins and losses, but I think he would hide behind Occam's Razor and point out that, yes, maybe you're right, but we can't know for sure if you are and this is as simple as it gets. Colley's method is valid and self-consistent, and therefore merits inclusion in the BCS computer rankings. It just produces funky early-season outputs due to the lack of information available. Luckily, that's not supposed to matter until the end of the season. 2008 be damned.
Now, Mr. Billingsley's method... there's a whole 'nother story.
49 comments
|
1 recs |
Do you like this story?
Comments
Bear
"Stability is a factor in teams that win the championship. But if you stabilize on a team that's going to end up short of that, then all you're doing is spinning your wheels in the 45-win range."-----Daryl Morey
by fanoflosingteams on Sep 12, 2011 7:45 PM CDT reply actions
Crap
Now all I can think of is 45>35
http://www.twitter.com/orlansky_40as
http://www.twitter.com/JayMashBON
by 40AS on Sep 12, 2011 8:03 PM CDT reply actions 1 recs
Thanks for the link to the Colley matrix paper
This one was implemented shortly after my obsession with the BSC computer ranking systems ended.
It was a short-lived, but passionate affair.
I am on Twitter @jeffchaley
New BON rule:
Every front page article must have that picture of Case McCoy
http://www.twitter.com/orlansky_40as
http://www.twitter.com/JayMashBON
I like the picture of post-Big 12 Baylor as well.
proud to swim home
by learned hand on Sep 12, 2011 8:34 PM CDT up reply actions 3 recs
Even shorter version
We’re 2-0 and our schedule boasts opponents with an absurd .895 (17-2) record vs teams not named “Texas”. Any algorithm based solely on W/L and full-season SOS, and derives SOS only from this year’s W/L to date, will think we’re awesome with those numbers.
But because that system factors in full-season SOS, rather than SOS-to-date, it’s statistically incorrect to give it any credence at all until the season’s done.
That is all. Go home haters.
Almost
The system factors in only the games already played, not the future games. It’s not explicitly stated, but if you read the full explanation, you can see that the idea is to use previous games only. You can convince yourself by playing around with his hypothetical game adder ma-jig. I made Baylor and Oklahoma both lose to crappy teams and they dropped way down but Texas’s rating was unaffected to the 5th decimal place.
You're right, and no "almost" to it
Guess I should read the manual first, but eesch, matrix algebra gives me cold sweats. Still, the ranking is even more specious than I thought. Apparently we’re number one because we beat Rice, who beat Purdue, who beat W Michigan. And we beat BYU, who beat Ole Miss, who beat S Illinois, who beat an FCS team. I suppose no other team in the nation can claim to be undefeated and chain that many victories together in the slender season to date. Not much cause for celebration.
by Dagga Roosta on Sep 12, 2011 10:25 PM CDT up reply actions
How I've missed Horn Brain
I’d forgotten that you’re a damn fine writer, too. Great post.
75-37-5. Now GTFO.
Pedantic point
No need to iterate – the Colley matrix is positive definite and has a Cholesky decompostion, thus the rankings can be found via direct solution (using sparse decomposition techniques for computational efficiency). More than you wanted to know, I guess.
by BurntOrange&Blue on Sep 12, 2011 9:06 PM CDT reply actions
That's what I was trying to avoid saying
By pointing out it was a system of N equations. Nerds like us can go read the white paper.
Fantastic article.
Feel like explaining why RB is such a terrible ranking system sometime? I think it might have been you that wrote a pretty scathing commentary on it explaining how it was dropped from the rankings 2/3rd of the time or something.
Greg Davis haikus; a lot like his offenses; always go sideways.
by pleaseplaykindle on Sep 12, 2011 10:22 PM CDT up reply actions
It was me.
He averaged being dropped from the poll more than half the time. Not sure why they keep him around. He also openly claims that his poll should just “know” that the Florida States and Oklahomas of the world are better than the Baylors and Toledos, all other things equal. I was thinking about writing something more detailed like this about each of the computer polls, but it depends on how much info I can get. Some of them won’t release their formula, which is super sketchy to me.
Super sketchy is right.
We allow computer polls to be completely black boxes to even the NCAA? This is absurd. It’s like allowing politicians to accept campaign money without reporting where it came from. Oh wait…
Greg Davis haikus; a lot like his offenses; always go sideways.
by pleaseplaykindle on Sep 12, 2011 11:07 PM CDT up reply actions
I think at some level
the polls get looked over by the BCS. They have to follow certain rules like not including MOV, so I imagine that that is checked for somehow if there is any sanity in the system. Oh wait…
yep
RB poll may well extremely silly in many respects, but I do give him some credit for essentially having an open formula and explicitly showing his work (I don’t think the formula is posted, but since each pre and post game number gets shown, I’m sure anyone who cares could probably figure it out).
I’m pretty sure, though not extremely sure, that his “factoring in preseason expectations” has virtually zero impact on his final numbers. Too many intermediate steps and inter-related calculations for it to matter much.
That said, his system basically mimics human thought processes in many other ways, though (which is why it correlates so well). Most notably NEVER going back and saying “well, that week 2 win over team X was actually worthless because it turned out they were awful” or “well, that week 3 win over team Y was actually awesome b/c they ran the table the rest of the way.” In the sense that it (somewhat) objectively mimics a human thought process and churns out similar results it’s clearly doing something that at least arguably has value.
It’s just that if you prefer a system that has any statistical or mathematical basis (or that you can about finding the best team instead of replicating flawed evaluation processes) that it seems silly.
As a final point, the primary reason the other five systems correlate so well is that they’re basically the same model. As far as I can tell, their approaches tend to be VERY similar, so it’s not stunning their results are as well. My $0.02 is that it’d be better if there was more variety in both approaches taken and variables used (i.e. letting at least some models factor in HFA, actual scores, etc.), and that it’d be better if systems had to prove that they were valid using actual data instead of publishing math-heavy white papers that mainly confuse readers.
It wouldn’t be THAT hard to arrange a number of tests, ranging from predictive to various retrodictive standards. It also wouldn’t be hard to test the models against “dumb” systems like simply ranking teams based on win %, since presumably any system worth anything would blow a simple win % calc out of the water. Unfortunately, there isn’t really any appetite for this sort of thing in the current environment.
Probably don't need sparse techniques for a 130x130 matrix
Iterates can also show you interesting things about sensitivity of the metric to SOS of opponents’ opponents’… etc.
Greg Davis haikus; a lot like his offenses; always go sideways.
by pleaseplaykindle on Sep 12, 2011 10:20 PM CDT up reply actions
Old school
Well if the FCS, etc. “collapsed” teams get expanded, the matrix gets bigger. But with today ’s computational power, you are right. I am from the old school, when this was not such a small matrix; however, if computational cost is no object, we could just compute eigenvalues and the inverse by cofactors and determinants and get all kinds of “interesting” info ;-)
Hook ’em
by BurntOrange&Blue on Sep 13, 2011 5:55 AM CDT up reply actions
I have a funny story about cofactors (believe it or not)
Professor in my Numerical Analysis class at Texas: Can anyone tell me an algorithm for computing a matrix inverse for a 30×30 matrix?
silence
Professor: YOU points name an algorithm.
Student: ….method of cofactors?
P: Thank you, you just told me the stupidest way of doing that. Why? What if I allowed you a computer that could process a flop in the time it takes a beam of light to cross the nucleus of a Hydrogen atom? You still wouldn’t be able to compute the inverse by cofactors in less than the age of the universe.
P: Okay, what about parallelizing? If I allowed every single atom in the universe as a computing unit… you still wouldn’t be able to the compute the inverse by cofactors in less than the age of the universe.
Pretty amazing that it doesn’t work for even relatively small matrices!
Greg Davis haikus; a lot like his offenses; always go sideways.
by pleaseplaykindle on Sep 13, 2011 7:10 AM CDT up reply actions 1 recs
That's a fact(orial)!
BTW, I am surprised to hear a UT math prof was that unkind (all of mine were really nice). But I suppose if you want students to compute like adults, you can’t let them compute with minors ;-)
by BurntOrange&Blue on Sep 13, 2011 6:44 PM CDT up reply actions
Turns out 30! is big number!
The professor was Alan Kline, a professor in CS who teaches graduate math classes in numerical analysis. He’s a very jovial guy, and I don’t think that got across in my transcript above. The classroom atmosphere was sufficiently relaxed that his demeanor was not perceived as rude. I hope I’m making sense.
Greg Davis haikus; a lot like his offenses; always go sideways.
by pleaseplaykindle on Sep 14, 2011 1:45 AM CDT up reply actions
Couple of picky points
First, I can’t speak for all of the computer rankings (they’re not polls, for Pete’s sake!), but I wouldn’t characterize home/away adjustments as “ad hoc” at all. We have really good historical data on what the home field advantage is worth (about 3 points).
Second, in explaining complex performance (humans, organizations), objectivity is overrated. What you want is intersubjectivity.
Third, my biggest problem with the computer ranking folks is that they do stuff that is not reasonable in order to get into the BCS system. The worst is not using margin of victory (MOV) because the BCS folks don’t want to encourage running up the score (or Spurrierism as Ilike to call it). they could accomplish the same thing by capping MOV at the point at which additional points provide little or no unique information (I am guessing 24 points), or progressively reducing the number of points awared for each successive point (of course adjusted for HFA; i.e., on your home field the 7th point counts as 7 but the 8th counts as 7/8ths of a point, away the 10th points counts as 10 points but the 11th counts as 10/11ths).
That’s why I like Sagarin’s actual ratings, because he includes one ranking that incorporates MOV and other that doesn’t and then computes the average. It’s a little ad hoc in that sense but that’s not so bad.
by Erasmus Funderburke on Sep 12, 2011 9:13 PM CDT reply actions
When they dumped MOV, that is when I stopped obsessing over the computer systems
I used to love these things to the point where I wrote my own. (Of course, in retrospect, it ended up looking almost exactly like SRS.)
One of the things that is interesting is that a computer system, when you start to think about it, really has to come from some sort of philosophical perspective. (For example, are you trying to be predictive, simple, etc.) There are many ways to do it, and the thing I now appreciate about the Colley approach is that he openly describes how it works.
I kind of feel like MOV is important. Systems like SRS, which greatly depend on MOV, or the kenpom.com approach in basketball, are pretty good at predicting outcomes of games. But I understand the politics of it, and what dropped margin of victory from all of these systems. (I am not crazy about that; if you are in a game with Florida, that ought to count more than if they beat you by 30.)
I am on Twitter @jeffchaley
Reggieball and I
We are like finely-tuned harmonic oscillators operating in the same environment and exchanging energy through minute periodic perturbations you’d miss in a first-order analysis.
I agree
But it seems more easily abused than it is in basketball, not only because of the smaller number of games but also the nature of the game itself. I understand they want to avoid situations where people run up the score or try desperately to score in blowouts to make the score look prettier. Not all 14-17 point losses are created equal. Perhaps the inclusion of a “close game” metric that Football Outsiders uses can avoid counting useless scores.
by TheElusiveShadow on Sep 13, 2011 12:21 AM CDT up reply actions
The home/away advantage
Of 3 points is justified by data, but it’s ad hoc to assume that the average adjustment is the same everywhere. Ad hoc doesn’t mean bad, either. Colley’s approach is just to avoid any ad hoc reasoning and he’s consistent about it. I think it’s very pure and mathematically elegant, and it maximizes on strict objectivity, the one thing that humans really can’t do.
That doesn’t matter, though, because in reality the bigger question is whether you want a rating that rewards performance or predicts it. This is a pure reward scheme, but there are “power poll” ideas out there that focus on prediction. That’s a bigger gripe than anything else, and that’s why they average multiple computer schemes instead of just agreeing on one that’s “right”.
The BCS polls are clearly meant to be predictive.
At least, in that their intent is to determine the best candidates for a championship game. I think there are inherent problems with this, namely:
1. The set of football teams are not “well-ordered” in quality. You have no transitivity.
2. The set of data from which you sample is too sparse to be considered a good sample.
This gets very much close to the philosophical question of whether the team that is crowned the national champion is the “best team”, whatever that means. In an ideal world, a football game would be played an uncountable number of times and a PDF would be populated. This would be absolutely no fun, and at some point you just have to let that go. We are human beings responding emotionally to the whims of a random number generator. That’s part of what makes this all fun.
Greg Davis haikus; a lot like his offenses; always go sideways.
by pleaseplaykindle on Sep 12, 2011 10:28 PM CDT up reply actions
I don't think that means they're predictive, either.
If a team goes 0-11 because all their starters are injured, then they return for the final game and win 1000-0 against the, until then, clear best team in the country in the last game of the season, do you put them in the NC game? They are clearly the best team and would destroy anyone else, but they don’t deserve to go. You reward the two teams with the best seasons the opportunity to compete for the championship. You’re not saying that some other team out there isn’t better, you’re saying they have the best resume and have earned a shot at the title.
There’s the rub. How much of the championship game appointments should be based on rewarding a good season and how much should be based on who you think has the best team? Personally, I think it would be utterly boring to give the NC to a team for purely what it IS and not what they DO on the field, so I lean towards reward. However, I can see how in some cases (take Boise State’s many unrewarded undefeated seasons, for example) it isn’t possible for a team to have as good a season as they could given a better schedule, so I’m for expanding to a 4 or 6 team tournament at the end of the year to try and include teams like that. I think that gives everyone a chance to prove themselves to a reasonable standard before reaching the title game.
Yep, that is the rub
And I don’t think there is any easy, straightforward way to address this to everyone’s satisfaction. I kind of prefer the old way: 2 polls, one before the bowls, everybody gripes about them, JoePa chickens out of the Cotton Bowl to play mizzou and gripes about not getting to be No. 1, Texas gets to be No. 1 with a loss in the last game …, oh yeah, and using sparse decomposition methods on any matrix bigger than 100×100!
Hook ’em
by BurntOrange&Blue on Sep 13, 2011 7:33 PM CDT up reply actions
Sorry, I guess I was thinking of the negative connotation of ad hoc
(as in by the seat of one’s pants).
At any rate, I would still like to see some sort of senstive, quasi-objective HFA included… tier it for true home field (yes, it may differ at various fields but a low-reliability adjustment is still better than no adjustment), advantageous sites (when Texas plays UCLA in Dallas, for instance, or Rice agreeing to move their home gave to Reliant), and true-neutral sites (TX/OU). My guess is things like GIS can help to make these types of distinctions more accurate.
As for whether it is useful to strive for complete objectivity, I will leave that to the philosophers. However, there is a lot of mixed evidence suggesting that humans are actually pretty good at making most decisions (your undergrad social psych class be damned), and some of the best systems are hybrids (such as the BCS).
by Erasmus Funderburke on Sep 13, 2011 2:05 PM CDT up reply actions
Shoot
I should add that I agree with 95% of your article. It is really disheartening listening to and reading the downright unintelligible descriptions of the rankings and the BCS as a whole. Its a very good system that is in many ways, dragged down by an antiquated conference system and silly tie-breaker rules (Hello 2008 Big 12).
by Erasmus Funderburke on Sep 12, 2011 9:21 PM CDT reply actions
Thanks
I agree with you on the ranking scheme, but the selection procedures and champion determination could use some work.
Haha...
…when the dentally-impaired among us…
Call me. I’ll get you fitted with some dentures ASAP.
by ElMariachiLoco on Sep 12, 2011 10:19 PM CDT reply actions
Seeing a
bear standing that erect out in the woods through some brush I’m thinking Sasquatch for sure.
I know it’s worth nothing, but suck it Aggie, we’re #1.
Im still pissed
that OU went to the Big XII title game then on to the BCS championship game because of computer polls.
"Stability is a factor in teams that win the championship. But if you stabilize on a team that's going to end up short of that, then all you're doing is spinning your wheels in the 45-win range."-----Daryl Morey
by fanoflosingteams on Sep 12, 2011 10:53 PM CDT reply actions
It's early, but
This article is better than most Rob Neyer articles, who once wrote an article that was better than a Bill James chapter, which was better than Jon Voight’s racist op ed, who was in Pearl Harbor with Lisa Ross, who was in We Married Margo with Kevin Bacon, who dances better the Woodward or Bernstein;
therefore, this article is currently #1 on my article computer ranking poll.
by bevosbackside on Sep 13, 2011 12:13 AM CDT reply actions 2 recs
See? This guy gets it!
I hitch my wagon to him and will defend against anything negative anyone says about him or his reasoning.
I feel dirty. Like I just started a political party…
I know that feeling, you know.
Great stuff, brain.
Thank you
I was both amused and dismayed by the fuss over this computer poll just TWO games into the season. Our education system is really failing, it seems.
by TheElusiveShadow on Sep 13, 2011 12:17 AM CDT reply actions
What a great thread
Comments were as great as the original post. This community kicks ass.
75-37-5. Now GTFO.
Great Logic
I need this formula for proof that Shipley should be our starting QB. He has a 100% accuracy rating and has the longest yards per pass of any of our QBs.
Now we just need to figure out how we get him to throw the ball to himself since he is our best receiver.

by 






























