clock menu more-arrow no yes mobile

Filed under:

An Introduction to Advanced Basketball Statistics: Understanding Possession Estimation and the Factors that Control Efficiency

If you buy something from an SB Nation link, Vox Media may earn a commission. See our ethics statement.

The advanced statistical revolution has come to basketball in a big way.  With the new approaches that have been developed for analyzing the game, combined with the quality and quantity of information available on the Internet, the tools to arrive at a deeper understanding of the game of basketball are widespread. 

If you have been reading my blog posts at this site, you can probably tell that I am very enthusiastic about the use of some of the new basketball statistics.  It makes sense to take some time to explain a few of these statistics, and the logic behind them.  While this requires us to use math, none of the math needed is difficult.  More important is that we do some thinking.

Per Possession Stats:  Why We Use Them, and How We Estimate Them

Historically, the way we tend to evaluate basketball teams is to look at per game statistics.  For instance, we might look at the Big 12 statistics from last season and conclude Nebraska, who allowed 60.5 points per game, and Texas A&M, who allowed 61.3 points per game, were both better defensively than Texas, who allowed 62.2 points per the game.  The chief problem with this sort of analysis is that it ignores the pace at which these teams play.  Texas A&M played at an extremely slow pace last season, averaging 62.2 possessions per game.  Nebraska played slightly faster, averaging 63.6 possessions per game.  Of these three schools, Texas played at the fastest tempo, averaging 66.9 possessions per game.  Nebraska and Texas A&M were among the slowest-paced teams in all of Division I last year, whereas Texas' pace was pretty close to the Division I median of 66.6 possessions per game.

Per possession statistics help us to put all teams on an equal footing, no matter what tempo they play at.  A team that averages 70 possessions per game is likely going to both score and give up more points that a team that averages 60 possessions per game.  So when comparing two teams, we should look at how many points per possession each team scored and allowed, rather than simply comparing the per game statistics.  This is the reason that per possession statistics form the basis for the rankings available at sites like kenpom.com

This is not some sort of newfangled idea.  Dean Smith was using per possession statistics back in his coaching days.  But for whatever reason, their widespread use seems to be more recent.  That may be because up until recently, per possession statistics were a pain to get.  Dean Smith probably had an assistant tracking them for his team, but the rest of us didn't have that luxury.  Newspapers weren't reporting the total number of possessions in a game in box scores.

The information age has resolved many of these problems.  We now have on-line game play-by-play logs that one could easily use to determine how many possessions occurred in a particular game.  Additionally, there are formulas that can do a pretty good job of estimating the total number of possessions in a game from standard box score statistics.  Dean Oliver describes some of these estimation methods in his book, Basketball on Paper.  I want to take a bit of time explaining one of these estimation methods here, because it forms the basis for understanding a number of other important concepts.

I want to present a simple "derivation" of the basic possession estimating formula described by Oliver.  I promise that this isn't very complex.  Let's start off by imagining the game of basketball has slightly different rules from how it is actually played.  The difference in the rules for our imagined game is that the only free throws come on two shot fouls.  There are no "one and one" free throws, players are awarded only two free throws if they are fouled shooting a three point shot, and if a player is fouled shooting and the shot goes in, the player does not get an extra free throw.  In a game with these rules, it would be very easy to estimate the number of possessions for a team from box score statistics.  We could do this because each possession ends with something recorded in the box score.  Possessions can end when:  (1) field goal attempt is made, (2) field goal attempt is missed and rebounded by the defense, (3) free throws where the second free throw is made, (4) free throws where the second free throw is missed and rebounded by the defense, (5) the offense turns the ball over, or (6) the half ends.  If we don't worry about a possession that doesn't generate a shot at the end of the half, we can estimate the possessions using the equation

poss = FGA + 0.5 x FTA - ORB + TO

Here FGA is the number of field goal attempts, FTA is the number of free throw attempts, ORB is the number of offensive rebounds, and TO is the number of turnovers.  Possessions end on turnovers and on shots that the offense doesn't rebound.  Nice and simple.

That possession estimator is not quite right.  A real basketball game doesn't have these nice and tidy rules regarding free throws that allow us to use this relationship.  The solution to this problem has been to take the equation above and adjust it slightly so that possession estimates end up agreeing pretty closely with the actual number of possessions in a game.  The equation above is adjusted by changing the prefactor for FTA.  I have seen a few different values used here.  They are generally in the range between 0.4 and 0.5.  The consensus value for college basketball seems to be 0.475.  In the NBA, 0.44 is typically used.  One consequence of this difference is that the way we should calculate true shooting percentage in the NCAA (described below) will be slightly different than the formulas for the NBA that you will find on the Internet, although the practical difference is probably meaningless.  Anyway, the more realistic possession estimator is

poss = FGA + 0.475 x FTA - ORB + TO

 

The Two Levers of Performance

If you are convinced that possessions, and not games, ought to be the appropriate way to normalize basketball statistics, then you are ready for the next step.  Team performance on offense and defense should be measured as points per possession.  It is useful to break points per possession up a bit further, in order to isolate various elements of offensive (or defensive) performance.  In the coming months I will write about some of the ways in which I like to do this, but for now, we will focus on a fairly simple split.  Here is an equation

points/poss = points/(FGA + 0.475 x FTA)   x   (FGA + 0.475 x FTA)/poss

I haven't done anything crazy there, I have just taken points per possession, multiplied and divided it by the same number, and rearranged things.  For those of you who are math types, or at least remember some of it, this is very reasonable.  For those of you who are not math types and are for whatever reason still reading this, I promise you that what I did was completely kosher.  Although this is a mathematically trivial step, it is still useful conceptually.  We see that there are two basic levers a team has at its disposal to either maximize or minimize points per possession.  The first lever is

points/(FGA + 0.475 x FTA)

which is just the number of points a team can get per shooting opportunity (either a field goal attempt or a trip to the line).  This lever is just related to shooting efficiency for a team.  The second lever is

(FGA + 0.475 x FTA)/poss

which is the number of field goal attempts and trips to the line a team gets per possession.  A team improves this number on offense by getting offensive rebounds and protecting the ball.

This all may seem abstract, but analyzing these two basic levers after a game is over will often allow us to  understand what has happened.  Winning and losing in basketball generally boils down to making more efficient use of your shots and trips to the line than your opponent does, and creating more shots and trips to the line than you opponent.

True Shooting Percentage

By convention, the shooting efficiency lever I described above is expressed as "true shooting percentage."  To be honest, I really dislike this term.  I will explain why below.  First, let's present the formula for true shooting percentage (TS%) that is NCAA appropriate

TS% = 0.5 x points/(FGA + 0.475 x FTA)

Essentially, true shooting percentage is the points scored per shooting opportunity divided by two.  The number is divided for two so that it looks like a shooting percentage.  Typically many teams and players will have TS% between about 0.5 and 0.6, so the scale is similar to what it common for more traditional field goal percentages.  The problem with true shooting percentage is semantic -- it isn't actually a percentage.  For example, TS% can be greater than 1.  It can be as high as 1.5.  The 0.5 prefactor seems kind of pointless, other than the effort to make this look like a field goal percentage. 

While the name and prefactor annoy me, I still really like true shooting percentage.  I think it is one of the most useful of all of the advanced basketball statistics.  It works for both teams as well as individual players.  When combined with measures of shooting frequency, it tells us a lot about what happened in a particular basketball game, and can be a really useful starting point when trying to scout a team.

To show just one way that I like to use true shooting percentage, I have created the figure below.  This figure uses the results of the Texas vs. Arizona game in last season's NCAA tournament.  It plots the true shooting percentage of each player vs. the % of Arizona's shot opportunities that player used.  (Shot opportunities is just FGA + 0.475 x FTA.)  The big purple squares are the results from the Texas vs. Arizona game, while the small black diamonds are the season averages for each player.  I have used arrows to help indicate the differences between the game results and the season averages.  This figure tells us a lot about the Arizona offense and Texas defense in that game.

Zona_medium

A figure like this graph really tells a story.  Derrick Willams was held in check, but took a lot of shots.  His season averages of a 0.69 TS% while taking 21.6% of his teams shots were ridiculously good numbers.  Against Texas, he had a TS% of 0.4 (pretty poor), while taking a third of his teams shots.  The problem was that while Texas was putting the clamps on Derrick Williams, Solomon Hill was having a pretty good game, and several other players were killing Texas from the three point line.  Fogg, Mayes, and Lavender combined to take roughly 18% of Arizona's shots, and all three players had a TS% greater than 1 (meaning they averaged more than 2 points per shot opportunity).


Shot Opportunities:  the Second Lever of Performance

Shooting efficiency, as measured by true shooting percentage, only tells part of the story.  The second major lever a team has to improve (or reduce) scoring is the number of shooting opportunities per possession.  These are related to offensive rebounds and turnovers.  Recall that we would like to determine

(FGA + 0.475 x FTA)/poss

Let's take a look at the possession estimator formula

poss = FGA + 0.475 x FTA - ORB + TO

If we divide by the total number of possessions, and rearrange, we get

(FGA + 0.475 x FTA)/poss = 1 + (ORB - TO)/poss

This result is pretty sensible.  On each possession where there are no offensive rebounds or turnovers, a team gets one shot opportunity.  Offensive rebounds will increase the number of opportunities, while turnovers will decrease them.

When looking at the box score for a game, I like to calculate FGA + 0.475 x FTA for each team.  If there is a big disparity, or even just a small one that was important, looking at each team's offensive rebounding and turnover totals usually explains what has happened.  Every offensive rebound creates an extra chance to score for a team, and every turnover takes away one of those chances.

An Example of How to Apply These Ideas to a Single Game

To show an example of how the numbers tend to work out, let's look back at the painful box score from last season's Texas vs. Arizona game.  I first like to start out by calculating TS%, to see if there was a big difference between the two teams. 

Texas' TS% = 0.544 

Arizona's TS% = 0.543 

This is not a big difference, both offenses had essentially equal efficiencies.  Let's look at who had more opportunities to score.

Texas' FGA + 0.475 x FTA = 63.45

Arizona's FGA + 0.475 x FTA = 64.45

Arizona got an extra shot.  In a game where the two teams have equal efficiencies with their shots, the team with more shots wins.  On average, an extra shot is worth roughly one point in scoring differential, and in this case Arizona won the game by exactly one point. 

How did Arizona get that extra shot?  The next step is to look at offensive rebounds and turnovers.  Of course, there was one turnover that we all probably remember, but in a game this close every turnover and rebound counts.  I like to calculate ORB - TO for each team in games like this.

Texas' ORB - TO = -3  (ORB=10, TO=13)

Arizona's ORB - TO = -1  (ORB = 11, TO=12)

Then if we take the difference between Texas' ORB - TO and Arizona's ORB - TO, we get an estimate of the difference in shooting opportunities between each team.  By this calculation, Arizona is expected to be +2 in FGA + 0.475 x FTA.  They were actually +1.  In this case, the discrepancy comes from the fact that Texas had one extra possession than Arizona did in the first half of the game.  Both teams had the same number of possessions in the second half of the game.  Note that sometimes this estimate will be off because it is just an estimate, but in this case we have an assignable cause.

Some Stats Aren't for Losers

The whole point of advanced statistics is that they help us to understand wins and losses, an area where traditional statistics were often lacking.  By using per possession statistics, and accounting for shooting efficiency and the things that lead to extra shots, we can evaluate team offenses and defenses in a statistical way that helps us understand what makes a team successful (or less than successful).  These methods allow us to both understand offense and defense at a team level. 

I am traveling for work this week, so feel free to blast me in the comments section.  I probably won't be around to defend myself.