There was a post in the Mizzou postgame thread from
@CurbeloYourEnthusiasm saying the following:
_____________________________________________________________________
[Kofi] had a 6 game stretch where he hit 28/32 FTs. That’s 88%. He’s clearly capable of making them. Not many 50-60% foul shooters ever have that good stretch.
------------------------------------------------------------------------
That got me thinking,
how unlikely is this? In other words, given that Kofi had a stretch of 28 for 32 FTs made, what's the probability that he's actually a <60% FT shooter (or was last year, anyway)?
TL;DR -- using only the 28-for-32 information about Kofi, combined with just a little league-wide information, we can estimate that there's about a 1% Kofi is really a 60% FT shooter or worse. This is a bit less likely, but still close, to the value of 2% we would have estimated using the full 2019-20 season data.
Full Analysis
Well, first let's look at the initial statement. "Not many 50-60% foul shooters ever have that good stretch." How many really do?
The "binomial distribution" can give us the probability of N made FTs, out of a random set of 32, for a 60% FT shooter:
The probability of 28 exactly is about 0.05%. If we also add in the probabilities from 29 to 32, we get 0.0007, or a 0.07% chance. That's pretty small, but there are a couple of important caveats here.
First, this assumes every single foul shot is a completely independent, unrelated occurrence. There are a ton of reasons that assumption might not be true: confidence, tiredness in a game or a season, injuries, "streaky shooting". Those almost certainly matter, but for the purposes of a simple analysis, let's let them go.
Second, we didn't just take any random set of shots, but cherry-picked the
best stretch (whole games only) from Kofi's freshman season. Again, this would make the probability quite a bit higher.
Both of these are really important, and probably mean that our 0.07% estimate could be way off. If we really wanted to address the question, "how many 50-60% shooters have a stretch like 28/32", we would need to account for this. But that's not what I'm after here, and in fact, all we need is some prior data about NCAA-wide foul shooting to come up with a useful correction for our purposes.
(Other note: there's probably a way to correct for the cherry-picking of the "best" stretch directly, but it's either a bit of math I don't know yet, or it would likely involve a lot of tedious counting -- in either case, I won't use it here.)
So far we have found P(make 28+ of 32, given a "true" FT% < 60%). Here I sloppily swapped <60% for =60%, since worse shooters certainly won't be any
more likely to hit that many. But what we really want is the opposite: P("true" FT% is 60%, given 28+ made of 32). For that, we can use Bayes' Theorem:
P(A | B) = P(B | A) x P(A) / P(B)
If you haven't seen this notation before, P(A | B) means "probability of A occurring, given that B did occur".
In this case, A = "true FT% < 60%", and B = "made 28+ of 32". So P(A) is the probability of
any player having a true FT% under 60%, and P(B) is the probability of
any player making 28+ of 32. How does this change our beliefs based only on the 28-for-32 stat? Well, first we need some estimates for P(A) and P(B).
Kofi is a true center. In the NBA (only place I could find position-specific data), the bottom 1/4 of centers shoot about 65% or worse. Since college players usually have room for improvement, let's say that's about 5% higher than the NCAA. This gives us P(A) = 25%. We could also just use intuition here; to me, this number seems pretty reasonable (maybe a bit aggressive) given that Kofi is <60% this year, but was well over 60% in a full season last year.
We'll estimate P(B) in the same way we figured the first probability -- using the binomial distribution. This way, the same errors we made in finding P(B | A) will also show up in P(B), and they should (hopefully) cancel out to some degree. Since an "average" college hoops player is a 70% FT shooter, we can plug that into the binomial distribution to get P(B) = 1.9%.
Putting that all together, we get:
0.07% (odds a 60% FT shooter would make 28+ of 32)
x
25% (odds a college center is a <60% FT shooter)
/
1.9% (odds an average college player would make 28+ of 32).
=
A
1% chance that Kofi is truly a <60% FT shooter, given his 28-for-32 stretch last year.
Is there any way to check whether this is right? Well, we can never know Kofi's "true" FT%, but we can at least use more data. We can also use something called the "beta distribution", which will give us what we want directly: a distribution on the likelihood of a certain true FT%, given a certain number of makes and misses.
If we plug Kofi's 2019-20 stats in (111 of 164), we get the following distribution:
As we can see, there's a small but non-zero chance he's a "true" 60% FT shooter or worse, even taking 150+ free throws into account. If we add up the probabilities from 0 to 0.6, we get a better estimate: a
2% chance that Kofi is truly a <60% FT shooter, given his entire 2019-20 season.
These aren't exactly the same, but given that we were handed some very cherry-picked stats to work with, the fact that we ended up off by only a factor of 2 is pretty cool! And it's actually a bit comforting to know that our estimate was still smaller given only the 28-for-32 stat, since that's a pretty good stretch. This was all honest math -- I didn't go back and tweak P(A) or P(B) to get a close answer.
By contrast, if we had just used the binomial distribution alone, or if we had plugged in the 28 of 32 stretch to the beta distribution, we would have guessed something 50-100x smaller. Using information about other free throw shooters helped us use that bit of information with more context, leading to a better guess.