have spent a lot of time thinking and writing this summer about what it takes to have success at the highest levels of college basketball. A couple of things are inescapable when you start to look into success in college basketball. The top few programs in NCAA basketball have more talent on their roster than anyone else, and gobble up most of the championships.
Generally, we would expect that teams with more highly regarded recruits will be better than other teams. We also expect that teams with a lot of upperclassmen often do well, as these players tend to improve with experience. But how much do these factors matter? How do weigh the relative effects of having highly regarded recruits (who often bolt to the NBA after a season or two) against having experienced upperclassmen who lack NBA level talent? I am not sure that I am ready to answer this question just yet, but I am getting there.
I want to be more quantitative in trying to describe how recruiting and experience lead to on court success. Join me on a fun-filled romp through the data, with lots of pretty graphs. Or just skip to the end to see what I think all of this means for the 2011-2012 Texas Longhorns basketball team, and then start blasting away in the comments section.
Some of the basics of the study
In order to measure on court performance, I like to use the Simple Rating System, available at sports-reference.com. I like using SRS, as it does a pretty good job of predicting the eventual NCAA champion. SRS gives you the number of points per game a team is over the NCAA Division I average. Previously, I have used rankings derived from SRS, but here I am going to use the SRS points over average values. To give some context, here are the SRS results for last season. The top seven teams had an SRS greater than 20. The 24th highest SRS total was 15. While it varies from year to year, typically the top 5 or so schools have an SRS greater than 20, and the top 25 or so schools have an SRS greater than 15. To give these numbers a bit more context, here is a list of the NCAA champions with an SRS less than 20, going back to the 1980-1981 season:
2011 Connecticut (SRS = 17.95)
2003 Syracuse (SRS = 19.01)
1988 Kansas (SRS = 15.71)
1985 Villanova (SRS = 11.99)
1984 Georgetown (SRS = 18.75. In 1984 only 2 teams had an SRS > 20. Georgetown had the forth highest SRS in 1984.)
1983 NC State (SRS = 15.22)
I have included the 45 schools with the most RSCI top 100 recruits since 1998 in this study. (This handy list is available at statsheet.com.) My database includes all of the teams and players going back to the 2004-2005 season. It is important to keep in mind that this study does not cover all of NCAA Division I, but only covers the teams that recruit a high number of RSCI ranked players. This means that the results of this study cannot be extended to describe college basketball as a whole, but only describe the results that are typical for these top tier recruiting schools.
Recruiting and SRS
I wrote a few weeks ago about how program success seems to be closely related to recruiting. As a follow up, I have been looking for relationships between recruting and team SRS. One of the interesting things that I find with the teams in this study is that there is a correlation between SRS and the % of the available minutes played by a top 30 RSCI recruit. The % of minutes played by top 30 recruits for a given team is simple to calculate; you just total up the minutes played by all top 30 recruits on a team, and divide this total by the minutes played by all players on the team. The relationship between SRS and the % of minutes played by top 30 recruits is illustrated in the figure below, where I have also labeled some of the schools of potential interest in the plot. There is a lot of scatter in this plot, which is to be expected. Many other factors contribute to how a team performs; this plot only isolates one of them.
One thing that is really interesting in this study is that there is virtually no correlation between the % of minutes played by RSCI recruits rated 31-100 and SRS. I was pretty surprised by this. This is not to say that recruiting guys in this range is not important; remember that I am only studying the very top recruiting teams in the NCAA. The teams on my list average an SRS of 13.6, which means the average team from the group in my study is 13.6 points per game better than the NCAA Division I average team. These top 100 recruits likely go a long way towards separating this group of teams from the rest of the NCAA, but they don't do very much to help us discriminate between the teams in this study.
Given the large amount of scatter in the plot of SRS vs. the % of minutes played by top 30 recruits, it probably helps to look at these same data in a different way. I have plotted the data as a histogram. Each color represents a different range of total team minutes played by top 30 recruits. Each bin in the histogram corresponds to an SRS rank, and the height of each bar reflects the percentage of teams with a given number of top 30 recruits that falls into that bin. This graph does a good job of illustrating the downside and upside for teams from each group.
Experience also matters
I can use my database to determine the effect of playing experience on SRS. In general, players are likely to improve as they gain more NCAA basketball playing experience. One crude way of measuring experience is to count the total number of minutes played in previous seasons by players on a given team. Since my database only goes back to the 2004-2005, I can't get reliable numbers for minutes played prior to the 2007-2008 season. This means that we can only look at the results for the 45 teams that we have over the four most recent college basketball seasons, which gives us a total of 180 data points.
The figure below plots SRS as a function of total minutes of experience for a team. We see that there is a correlation between the results that is about as strong as the correlation between SRS and % of minutes played by top 30 recruits. I have also labeled a few schools of interest.
So both experience and the amount that top 30 players see the court matters. This is not a big surprise. We can learn more from these data using a technique known as linear regression. A typical approach with linear regression is to assume that some property (in this case SRS) is a linear function of several different factors (here % of minutes played by top 30 players and minutes of experience). One then "fits" this function to the data, and comes up with a model that predicts the property in terms of the factors. We can do this with SRS, which gives us a simple equation that predicts SRS as a linear combination of minutes of experience and % of minutes played by top 30 recruits.
The graph below plots the actual SRS vs. the predicted SRS based on the model. The fit is OK, but not great. This is not really a surprise, as there are obviously a lot of factors that aren't captured by this very simple model. But our model does contain useful information about how much the parameters that we have included influence SRS. (For those interested in technical details, the P values of the coefficients for both % of minutes played by top 30 recruits and minutes of experience were less than 0.001.)
We can make some interesting estimates about the relative importance of minutes of experience and % of minutes played by top 30 recruits in terms of how much each affects SRS. (The key word to the previous sentence was "estimate.") Regression analysis gives us information about how much SRS changes, on average, when we change one parameter while holding the other one constant. In a typical college basketball season, a starter earns about 1100 minutes played, and plays about 15% of the total minutes for his team. Using our regression model, getting a player with a top 30 RSCI ranking into the starting lineup has a slightly lower, but similar effect on predicted SRS as does returning one player who has started for two seasons, or two players who have each started for one season. Adding an additional top 30 recruit to the starting lineup improves the predicted SRS by about 2 points.
As an aside, my model can not tell us if there is any difference between returning one player who has started for two seasons, or two players who have each started for one season. Without analyzing the data to look for this, I cannot conclude if these things truly have the same effect on SRS, as I am assuming above, or if one of these two scenarios is better than the other. This is a weakness of the way that I have done the analysis.
I am always a bit skeptical about conclusions that are drawn solely based on model fitting. I like to be able to look directly at the data, to help see if the conclusions from the model make sense. In order to help visualize how both minutes of experience and the % of minutes played by top 30 recruits affect SRS, I have produced the graph below. It is a complicated graph, but I think it rewards some study. All teams with an SRS greater than 20 are shown with dark blue symbols. Teams with an SRS between 15 and 20 are shown with light blue symbols. All teams with an SRS of less than 15 are shown with gray symbols. The plot also has lines that indicate the predictions of the regression model. I have also included several orange triangles that indicate the locations of the last four Texas teams, as well as an orange circle projecting where I think Texas will fall next season (see below for an explanation).
It seems pretty clear that there is the greatest number of dark blue symbols in the upper right portion of the graph, and the greatest number of gray symbols in the lower left portion of the graph, with a diffuse band of light blue symbols in between. Most of the teams with an SRS greater than 20 have more than around 8000-9000 minutes of experience, but some very talented teams with less experience (such as the 2011 Texas team) also reach this level. There is only one team with less than 4000 minutes of experience that had an SRS greater than 20 (Kentucky, 2011).
The regression model predictions seem pretty reasonable, based on that graph. There is obviously a lot of scatter in the data that they do not account for, but they seem to be capturing the basic trends in the average outcomes.
So what does this mean for Texas' chances next year?
Things look like next year will be a rebuilding year for the Texas Longhorns. J'Covan Brown has played just under 1500 minutes in his career and Alexis Wangmene has played just under 1000 minutes. Myck Kabongo will be the only player on next year's roster who was an RSCI top 30 recruit. Based on this, I am estimating that Texas will end up in the area marked by the orange circle on the graph above. The graph isn't very densely populated in the region surrounding where Texas is projected to fall, but we can see that there are a number of teams with both more experience and a higher percentage of minutes played by top 30 recruits that have an SRS less than 15. Given the recent history reflected by the graph, Texas projects to have an SRS in the range of 10 to 15 next season. An SRS of 15 roughly corresponds to being a fringe top 25 team. Rick Barnes' Texas teams have seldom had an SRS that was much less than 15, so I wouldn't be surprised if next year's team exceeds these expectations.
With so many underclassmen playing next year, going into the 2012-2013 season Texas will likely have something like 8,000-9,000 minutes of experience. If Kabongo goes to the NBA after one season, Texas will likely be closer to 8,000, but if he stays, Texas should hit closer to 9,000 minutes of experience. With 9,000 minutes of experience and a top 30 recruit in the lineup, the chances of getting an SRS above 20 and being in the national championship hunt become reasonable.
Wrapping it up
Teams with more top 30 recruits and more experience do better than everyone else. This is not surprising. What is new here is that this work provides a very crude estimate of the relative importance of having a top recruit compared with returning starters.
The approach I have used here is oversimplified, but it does capture some of the general trends related to how recruiting success and experience affect on court performance in big time NCAA basketball. And the oversimplification is a kind of advantage here; this approach is simple enough to be usable for looking at a variety of issues.