Playing The Numbers Game: Predictors of Success in Texas Games, 2001-06
This entry marks the first of what will be a weekly column from esteemed BONer billyzane. With a focus on looking at CFB statistics in general, and those related to Texas in particular, BZ's column will offer a thoughtful weekly reflection on the game we love. Not all entries will be as in depth as this opening foray, but I urge each of you to print and take the time to read this one carefully. There's some great data to mine through. --PB--
I believe in statistics. I think you should too. There are problems with them, to be sure. They can be horribly confusing and, if misused, can be misleading. And they certainly can never tell the whole story themselves. But these problems are not inherent to statistics, only to the application of them. The goal of this column will be to, every week, go beyond the numbers that you’ll find on ESPN and on many blogs to find something that needs explaining and to explain it.
That's Playing the Numbers Game.
[Insert hokey theme music of your choosing here - I'm partial to anything that starts with a rolling piano.]
It should be noted that I’m not a statistician. I haven’t taken a real math class since Calculus in high school. But I totally dominated that AP test (think Aaron Lewis’ domination of that other AP in last year’s OU game), and then majored in sociology (among other things) at UT, which means I took at least two whole classes having to do with statistics. Yeah, those classes were mostly filled with athletes and Tri-Delts, but it was a real major, I swear. No, seriously. Whatever, jerks. Anyway, let’s see, what else? Oh yeah, I used to know how to do a linear regression. So in conclusion, I’m practically over-qualified. Regardless, not every week will be as numbers-intensive as this week. I promise not to geek out too much....after this week. This week I totally geeked out.
![]() |
| How the hell do you do a linear regression? Wikipedia says that this equation has something to do with it. Don't worry, this won't be on the test.</text> |
This week, I decided to take a look at what statistical performances are the greatest predictors of success (i.e. winning) in Texas games since 2001. Why 2001? Because I consider that the beginning of the modern Mack Brown Era. That year signaled that Texas was no longer rebuilding from the Mackovic Era, but had in fact been rebuilt. Also because data entry for this exercise was a pain in the ass and took forever. That too.
And what do I mean by predictors of success? It’s fairly simple. Which performances by winning teams most closely correlate with winning football games? We’re going to look at this in the realm of performances relative to the teams that lost the games. So, for instance, scoring more points that your opponent correlates 100% with winning football games. If you win that statistical category within a single game, you will always win that game. But that’s obvious. What about other statistical categories? That’s what this column is about.
As a side note, I totally cribbed this idea and the basic methodology from SMQ. He did something similar to this for every conference in 2006. Here’s his post for the Big XII. I have included more potential predictors of success and have tried to go deeper in the analysis to give more meaning to the numbers. Also, as I said, I have only included Texas games in my data – all of them – from 2001 to 2006.
Click here to read the rest of this entry.
Methodology
![]() ![]() |
| You know who wouldn't skip the methodology section? Law and Order</text> |
You’re more than welcome to skip this if it doesn’t interest you. Skip down to the next section. But keep in mind that examining methodology is the only way you can really tell if a statistic is properly constructed to determine what it claims to determine. With that said, here’s what I did.
I took 20 statistical categories that could reasonably be construed as predictors of success – the more often you do better than your opponent in that category, the more often you win. For every Texas game between 2001 and 2006, I went through and gave a "win" to a category if the winner of the game also "won" that category, and a "loss" to that category if the winner of the game did not "win" that category. For instance, take the 2005 Ohio St game. Texas won the game and finished with more total offensive yards than tOSU. Thus, for that game, the category "total offense" gets a "win" – that is, it correlated with winning the game. In that same game, however, tOSU finished with a better red zone scoring percentage. Thus, that category gets a "loss" because it did not correlate with winning the game. If the teams tied within a certain category, I ignored it.
I then got the "winning percentage" for each category for each season by adding up the "wins and losses" for each category. I then averaged those over the course of the 6 years. I also took some standard deviations to see how reliable those averages are, but I’ll get to that in a bit. Please remember that these categories are mostly from an offensive point of view, but that they are the exact same if you look at them from the defensive side. So when I say that "Net rushing yards" has a winning percentage of 85%, I mean that the winning team has more total yards than the losing team 85% of the time. I could also say that the winning team gives up fewer yards on defense than the losing team 85% of the time. It’s the same statistic.
The Data
WIN PERCENTAGES FOR EACH CATEGORY BY SEASON AND AVERAGED

WHAT THIS DATA MEANS AND WHAT IT DOESN’T MEAN
• These winning percentages are NOT the percentage of time Texas wins games when it wins the category. Saying that Texas is 60-0 when it out-rushes its opponent is not valuable because we don’t know what happens when Texas’ opponent out-rushes them. If Texas is also 60-0 when getting out-rushed, then rushing yards just don’t have very much to do with winning games.
• These winning percentages are instead a combination of these two statistics. It’s the percentage of time that the winning team led the losing team in this category. So if Texas out-rushes its opponents 60 times and wins every time, and Texas’ opponents out-rush them 60 times and the opponents win every of those games, then the winner of the rushing battle had a 100% winning percentage in the games. That is what this number means.
• Remember also that these are correlations, not causations. A high percentage in one category does not necessarily mean that doing well in that category causes the team to win. It merely means that when you do well in that category, you also usually win. There may be an independent variable that simultaneously causes these two results.
• Causation is very difficult to prove, given 2 problems here: 1) small sample size, and 2) regressions are hard and generally require software, and I can’t be bothered for something that wouldn’t be conclusive with our small sample size. Instead, we’re going to take our correlations and apply them to what we already know about football to try to come up with likely causations.
Analysis
THE BEST PREDICTORS OF SUCCESS FROM 2001-2006

ARE GAMES WON IN THE TRENCHES?
We should not be surprised that having more total offense than your opponent has such a high correlation with wining games. More offensive yards than your opponent will very often lead to more offensive points than your opponent, which is the definition of how to win.
The only other categories with an 80% or higher win percentage are Net Rush Offense and Sacks. This does seem to reinforce the typical coachspeak that games are won in the trenches. Further, in the last 6 years of Texas games, a team that has had more sacks and more rushing yards than its opponent has won 89.36% of the time. We have to wonder, however, how much of this is related to the fact that teams that are already winning rush more often and teams that are already losing pass more often (and thus get sacked more often).
Well, if you look at the correlation of winning to having more passing yards than your opponent, it’s relatively low, at only 67.53%, ranked number 14 out of 20 on our list. Yards per completion has the same correlation. This could tell us one of two things. First, it could indicate that yes, losing teams pass more and that’s why passing yards doesn’t correlate with winning very well. But it could also potentially tell us that passing just really isn’t that important to winning. Games are won in the trenches and teams that run better usually win. Which is it?
The answer, I think, lies in the correlation of yards per pass attempt to winning games, which is 77.92%, ranked number 6 on our list, a full 10% better correlation than both other passing statistics. Why do yards per pass attempt correlate so much better to winning than yards per pass completion? I think it’s because teams that are losing throw more often and thus throw many more incompletions, which lowers the yards per attempt statistic for the losing team, but not the yards per completion statistic. Thus, losing teams will often have more yards per completion than the winning team, but fewer yards per attempt.
This seems to indicate that the low correlation of passing yards to winning games has more to do with the fact that losing teams pass more often to catch up (and thus get sacked more often) than the idea that games are won in the trenches. Further supporting this is the correlation of yards per carry to winning, which is only #8 on our list at 76.62%, about 9% less than net rush offense. That is, running efficiently doesn’t have nearly as much to do with winning as running often. Teams that are already winning certainly run often to control the clock, but they don’t necessarily run efficiently.
Thus, it doesn’t seem that dominating the trenches wins football games, but rather that winning football games creates statistics that imply the winning team was dominating the trenches. And who knows, perhaps why the winning team got ahead in the game to begin with had a lot to do with passing the football well.
OTHER INTERESTING ODDITIES
As I expected, being the home team had very little to do with who won in games Texas plays, coming in at 54%, good for only #18 of 20 on our list. This probably has something to do with the fact that most of Texas’ toughest opponents (OU, Bowls, Big XII Championship game) are played on neutral sites and thus don’t figure into this analysis. However, I think it’s mostly the fact that Texas is so good, year-in and year-out. They’re good enough to be impervious to the dangers of playing on the road.
Is getting off to a fast start important? The short answer is, yeah, but not THAT important. Scoring first only has a 70.13% correlation with winning, #13 on the list. However, leading after the first quarter has a much more substantial 77.92% correlation, #7 on the list.
![]() |
| Penalty Flags? The more the merrier! |
Perhaps most surprisingly, having fewer penalty yards than your opponent actually has a negative correlation with winning football games. Teams that have fewer penalty yards than their opponents actually win only 46.75% of their games. This isn’t an anomaly either, SMQ came up with a 43.5% correlation for the entire Big XII in 2006. This seems to imply that being a "disciplined" team that doesn’t "kill itself" with penalties has absolutely nothing to do with winning football games. Obviously, there are game situations in which a penalty can hurt you a lot. But overall, avoiding penalties isn’t a big deal. In fact, there seems to be a very loose correlation between being penalized and winning games. Why? I would guess aggressiveness, particularly on defense, but that’s just a guess. Ideas?
Feel free to talk about anything else interesting in the comments.
YEAR BY YEAR RESULTS

To come up with these numbers, I took the average of all 20 winning percentages for each year. What really stands out to me in these numbers is how large the win percentages are for the years 2001 and 2005 (both above 76%) compared to the other 4 years. What do these two years have in common? The two Texas teams in 2001 and 2005 were arguably the most dominant of the Mack Brown Era (an argument can be made for 2004 over 2001, but I think that argument is a losing one).
So what does this tell us? I think it says two things. First, it reinforces how dominant these two teams were. Dominant teams should not only win, but also win most of the statistical categories while winning the game. A high "Average Category Win Percentage" shows a team doing just that (and alternatively, when they lose, they lose most of the statistical categories also). Second, this tells us that these categories are reliable. If these 20 categories really are predictors of success, we would expect the most dominant teams to win the highest percentage of the categories. Which is exactly what happened.
Reliability of This Data
A NOTE ON THE STANDARD DEVIATIONS
[Feel free to ignore if you don’t care. Skip to The Bottom Line, below.]
![]() |
| Hey Everyone, Math "Humor"! OMG! LOL! WTF? |
Standard Deviations basically say how much deviation there was among the numbers that were averaged to get the "Average Category Win Percentage." If we average 10 numbers, which are all 50%, the average is of course 50%, and the standard deviation is 0% (every number we averaged equals the average so there was no deviation from the mean). If we average 10 numbers, 5 of which are 0% and 5 of which are 100%, our average is still 50%, but the standard deviation is 50% because every number we averaged was 50% deviant from the mean (0 and 100 are both 50 away from the mean).
The smaller the standard deviation, then, the closer to the mean the numbers are – which makes the average itself more reliable. See what I mean? If every observation we have is 50%, then the average of those observations (50%) is very reliable – a standard deviation of 0%. But if we have 10 observations, 5 of which are 0% and 5 of which are 100%, our mean of 50% (while the same as the other mean) isn’t very reliable (a Std. Dev. of 50%) because the observations we got varied so wildly from each other (and thus, by definition, from the mean).
The Bottom Line: the lower the standard deviation, the more reliable the average win percentage is for that category.
The standard deviations in the year-by-year don’t matter too much for us. They’re there for you to look at if you care, but they are important for the Win Percentages by Category that we used in the analysis section.
RELIABILITY OF THESE WIN PERCENTAGES BY CATEGORY

The first thing that jumps out at me is how high the standard deviation for Net Rush Offense is. So I looked at the data. From 2001-2006, it has had anywhere between a 100% correlation with winning games (2004) to a 61.54% correlation (2002). This seems to me to back up my earlier conclusion that good Net Rush Offense statistics are the result of winning, not the cause of them.
The 2002 team went 11-2, but was hardly dominant in amassing those wins, having to come back to beat some teams like Kansas State and Nebraska (both those teams outgained Texas on the ground despite Texas winning the game, as did North Texas, Texas A&M, and LSU). In the games Texas had to come back to win, they had more passing yards than their opponents and fewer rushing yards, which reinforces my earlier conclusion. Furthermore, the two most dominant teams of this sample, 2001 and 2006 – teams that regularly got out to huge leads and then held on – both had Net Rush Offense categories that correlated with winning 92.31% of the time and Yards per carry categories that correlated at much lower percentages (76.92% in 2001 and 84.62% in 2005). I’m convinced.
Moving along, as you can see, penalty yards has a low standard deviation of 8.49%, meaning that the average of 46.75% is pretty reliable. This just flies in the face of everything we’ve been told by coaches and pundits.
And once again, predictably, Total Offense has the lowest standard deviation (by a lot) because it’s most obviously directly related to scoring more points than your opponent. That number is pretty reliable.
Anything else you guys can think of?
--BZ--
37 comments
|
0 recs |
Do you like this story?
Comments
My head is spinning...
Outstanding stuff BZ - brings back memories of statistics class. It's very interesting to see what carries more weight over the long-term. The most surprising is the penalty yards - like you said it totally flies in the face of everything we're told. Hopefully Robert Killebrew won't get wind of this...he might step it up a notch.
by JT Longhorn on Sep 4, 2007 8:13 AM CDT reply actions
I don't suppose your data covers...
which penalties had a better correlation, because I would assume that some actually do have a negative correlation to victory, like false starts perhaps.
penalty yards
could be a result of playing the B team more in blow outs, who, you would guess would have more penalty yards because they are generally young, inexperienced or just not as good.
that might be some of it
but if you look at the two most dominant years in which texas played its second stringers the most (2001 and 2005), both those teams have the highest observed correlation of less penalty yards to winning. With your theory, you would expect the opposite to be true.
But I don't know. This is as good a theory as my "defensive aggressiveness."
Something to keep in mind
Texas has dominated his opponents in 01-06 time frame. Thus, any category that Texas has been good at will be considered a good predictor by your data.
For 04-05 go check a new category "rush yards by the QB" and see how important that category is :)
good point - that's very true
that's why I gave you all the winning percentages for each category for each year. you can parse out what each team was best at each year and draw your own conclusions.
also, i worked this out under the assumption that texas would remain dominant for the foreseeable future and thus these would continue to be good predictors of success.
It's all very interesting
So why the difference between Fumbles and Interceptions?
Since interceptions happen down field relative to fumbles the recovering team usually further to go to score?
Why the big difference between "Yards per attempt" 77.92% and "Yards per Completion" 67.53%
trying to throw long pass payoffs better than actually completing a long pass?
Another category could be just plain jane yards per play. it might come in at 2nd or 3rd spot.
A category that might be interesting is trying to correlate ball control drives and fast striking drives and see which has a greater effect on winning.
There also never seems like enough defensive stats.
and how they compare with offensive stats.
So you could answer basic questions like does defense or offense win games or at least what are the correlations.
i have no idea
why fumbles and interceptions are different. their standard deviations are big enough that it's possible that they're statistically quite similar. but then again, their cumulative effect (turnover margin) is greater than either of them, so maybe there's something there. I don't know.
As for yards per attempt and yards per completion, I tried to explain that in the "are games won in the trenches" section. I think that's certainly a plausible explanation.
As for the defensive stats, as I said in the column, these are all both offensive and defensive stats. "total offense" just means having more offensive yards than your opponent. I could call it "total defense" and that would mean giving up fewer offensive yards than your opponent. But it's the same statistic because it's relative to what your opponent is doing.
More thoughts
On turnovers, fumbles certainly give you better starting field position than an interception since most fumbles occur around the line of scrimmage whereas most INT's occur somewhere down the field. Throw in the number of "hail mary" type interceptions and I would expect the impact of INT's as a predictor to deviate downward as compared to fumbles, which leads to this thought: What about overall return yardage (the "hidden" yardage in many games)? I noticed above that you had kick returns, but it seems that the overall return yardage (which would include punts and turnover returns) might me a better indicator.
Yards per attempt
Yards per attempt is an indicator of how efficent your pass game is. It shows that you are throwing the ball downfield more and hitting more big plays in the passing game. An offense like Tech's gets a lot of pass yards, but it takes a lot of attempts to get there and your ypa is lower. Historically, an effective passing offense would have a ypa of 7.5 yards or more. YPA should be a better indicator of wins than passing yardage or yards per completion.
What about Yards Per Play?
by Red Blooded @ Burnt Orange Nation on Sep 4, 2007 5:25 PM CDT up reply actions
well, sort of...
the only difference between yards per completion and yards per attempt is that the latter includes incompletions. a large yards per completion average also indicates that "you are throwing the ball downfield more and hitting more big plays in the passing game," as you say.
the more incompletions you have, the lower your yards per attempt will be relative to your yards per completion.
I think you get this, but it was a little confusing based on how you said it.
Very interesting
I'm assuming a sample size of about 60? If so, it would be hard to do a multivariate analysis on this data but it might be worth taking a look. I'm not sure how, but if you can send me the data I have the knowledge and access to software to perform such an analysis. In any case, fascinating stuff--BZ you were well-trained as a sociologist. Also, PB, I was hoping something like this would pop up. Thanks.
by jukey on Sep 4, 2007 9:07 AM CDT reply actions
Geeking out
If BZ sends you the data and you get to use a software tool (like MiniTab), then you might try leaving the data as discrete. You could then factor in ties (a push) which were intentionally ignored in this analysis. Discrete analysis may also bring about more meaningful multi-variable effects.
You might also try re-analyzing the data set using only Texas' outcome. Since only 1 team is common throughout all the data sets, this might prove more statistically valid.
BZ- To help with your questions about the chicken/egg (or winning team rushing vs. trailing team passing), the best way would probably be to analyze the same data after the 1st 3 Quarters of play in each game. Most teams don't start playing "burn the clock" or "catch-up" until the 4th Qtr.
Disclaimer if any of this proves to be bad advice: I took 2 less Stats courses in college than BZ. But I did stay at a Holiday Inn Express last night.
chicken/egg
that is a good idea and i thought about it, but there just isn't any easy way to find/input that data divided up by quarters. I know it exists because box scores have play-by-play listings. But with almost 80 games to do, unless there was just a number in the box score that had the information, I just didn't have the time or patience to add stuff up.
But if I run out of column ideas in the future (a very distinct possibility), then I might revisit this.
Understandable
If it were easy (and I weren't so lazy), I would have done it for you. But unfortunately, that's not the case.
Looking forward to your future statistical endeavors.
yeah, sure
if you want to do it, then by all means, be my guest. I remember very little about this stuff, but I'd love to see what you could come up with.
post your e-mail address or send it to PB and he can get it to me.
and thanks for the kind words.
ooh, interesting
Texas "won" the following categories, in order of importance: sacks; 3rd down efficiency; first quarter lead; red zone scoring %; first to score; home team; kickoff return average.
Arkansas State won every other category except fumbles lost, which was even (neither team lost any fumbles).
Some other ideas for the BON stats army
The factors that drive Texas' success as a program are not what it does against weaker opponents but what it does against ranked teams and rivals. UT almost always outrushes the weaker teams on its schedule even when it isn't particulalry good at running the ball (like in 2003).
It would be interesting to see what predicts the outcome of games against OU, A&M, non-conference BCS teams (like Arkansas, Ohio State), Nebraska, bowl opponents. Or you could expand this list to include all Big 12 teams except maybe Iowa State and Baylor in order to increase sample size.
I don't know if the data are available, but if so, you could test MB's theory that it is all about explosive plays, ie. the difference in explosive plays.
Team With Most Points Wins 100% of Time
Still testing this theory - but so far it has been holding up pretty well!
Bah, you beat me to that one.
Although, I would have phrased it in a more though-provoking ESPN type analysis, ala Herbstreet:
Total points allowed; points per quarter; weighted scoring efficiency (a FG is normalized to 3/7 of TD+PAT); etc....
What if you re-run your data . . .
Presumably you have all of this data in your spreadsheet or database where you can go back and sort for different indicators. If so, what I think might be revealing in some way (I'm not quite sure how at this point). Let's say that we all agree that the 2005 and 2001 football teams were undeniably dominant. While obviously interesting and helpful for identifying those teams as being dominant, keeping the data from those two years in the calculations actually skews all of the date.
Why not re-run the date WITHOUT 2005 and 2001? Then the universe of data is not all Longhorn games for all the years, but rather the data of all non-dominant Longhorn teams. It just might be that after filtering and analyzing the information, something useful might jump out.
Just a thought.
Suggestions
Take this from what it's worth, but when you're dealing with all statistics that are not independent, statistical regressions are not gonna give you very good numbers.
To get a good regression out of that data you probably wanna test how correlated different variables are, and that should be a giant pain in the ass.
by fathead on Sep 4, 2007 9:17 PM CDT reply actions
And also
For that matter, what you did isn't quite a linear regression, and making a linear regression would be kinda difficult, because your possible result values should only be 1, winning, and 0, losing.
So you need some sort of method that allows you to predict whether you will chose 0 or 1 (chose in a metaphorical way)
You can use a discrete choice model to do this. Im gonna try to think how to apply it, because it's not very straightforward, but i think it may be doable
by fathead on Sep 4, 2007 9:25 PM CDT reply actions
yeah
I don't know anything about regressions anymore, or really any multivariate analysis. I didn't claim that this was a linear regression that I did. I don't even know how to do one. I'm not exactly a statistician (and by "not exactly," i mean "not at all."); I'm a lawyer. I'm just trying to bring out interesting statistical patterns that I find.
But I'm certainly not trying to bogart the time people on this blog give to stats analysis. So by all means, see what you can come up with. I'll be interested to see the results. You can probably do more with the math than I can.
And I did start to look at how correlated a few of the variables were, but you're right. it was a giant pain in the ass, so I stopped.
sorry
I think upon review my post came of a little dickheaded-ish. So let me rephrase that. If i were to run some numbers and came up with something interesting, would you want me to send it to you? I think that you had a great idea, and it definitely brought up some important questions.
by fathead on Sep 5, 2007 12:33 AM CDT up reply actions
oh, no worries
I'm fairly certain I come off as dickhead-ish all the time without meaning to. I didn't take offense.
But yeah, if you want to run the numbers, I would be interested. Jukey is doing some sort of multivariate analysis also, so you may want to coordinate with him (he posted his e-mail below).
great analysis & comments, very thought provoking
I appreciate this addition to the site. It's a great reminder of what stats can reveal and, just as important, how stupidly they are often used. Whoever was broadcasting the college world series this year drove me batty with their inane stats inserted at every opportunity, e.g., "in 75% of CWS championship games, the team ahead after five innings goes on to win!" As a statistics prof-friend says, numbers don't have memory. Facile correlations don't tell you anything.
But I digress. BZ, curious if you noticed appreciable differences in stats wrt strength of schedule. Any meaningful differences in years when we played an Ohio State vs. a Central Florida, a scary OU vs. a feeble OU?
re: strength of schedule
i'm contemplating doing a full column on strength of schedule. if I do, I'll try to work this into it.
but just to clarify, do you mean any difference in those specific games against tough opponents or are you wondering whether playing those tough opponents affects how texas performs in these statistical categories in other games those years?
On lies and bikinis
I loved the post but I was reminded of some sayings about statistics.
"There are three kinds of lies: lies, damned lies, and statistics." ~Mark Twain, autobiography, 1904
"Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital." ~Aaron Levenstein
Personally, I think the relationship with rushing yards FOR TEXAS is a real factor because UT has never really had an offense that could pass effectively on any defense, unlike some of the spread systems like Texas Tech, Hawaii, etc, for which rushing yards are meaningless.
That is exactly my point
I meant that they could pass effectively "on any defense," that is, against strong defenses when they couldn't run. In general in the Mack Brown era, if Texas couldn't run, they couldn't pass either. Chris Simms' forte was the play action pass to the deep sideline or the post. When UT couldn't run, they often lost because Simms release was too slow and the system of patterns was always dictated by the assumption of playaction, which generated sacks and interceptions.
In the VY years, Texas won when a RB or Young had a good day and lost when neither did, because for a receiver to be open enough for VY to hit him, a safety and/or LB had to be up to spy on VY. Sure, there were games that Texas still won in which they passed most of the time, but most of the games in which they had to make a living passing, they lost.
2006 and 2002 were possible exceptions to this because in 2002 the running game was so bad and in 2006 the defense was so bad (in the last half of the season), the only way the Horns had a chance was to pass. Interestingly, in those years, the team with the most rushing yards won only 3/4 of the time or less.
more the latter
wondering how a tougher schedule overall, one year to the next (subjective, I know) skews the percentages, if at all.
Data Request
BZ,
Please send the data to miller3676@gmail.com. I'll analyze it and report back as ssoon as possible.
by jukey on Sep 5, 2007 8:45 AM CDT reply actions

by 


































