It’s mid-December and the college hoops landscape is beginning to take shape, as some of the over-hyped pre-season darlings have shown us who they really are (shoutout UNC) and the unlikely stars have shown themselves (hello again Purdue). But everyone knows the entire season in CBB is a lead-up to the only thing that truly matters, March Madness. Every year is a race to figure out which teams “deserve” what seed and who among them are built to avoid the early round upset. Many will try and convince you it’s all random, that the best team doesn’t actually win and that the upsets and format diminish the significance of the glory found at the end of the most difficult 6-game winning streak to obtain in sports. I mean for crying out loud they lost to their mom in their family bracket challenge and she picked based on Mascots! It must be random. While I recognize losing to someone who happened to pick Peacocks (St. Peter’s) to beat Wildcats (Kentucky) can lead to this kind of thinking, I’m here to tell you definitively how to avoid betting on a doomed horse come March. I’m not going to give you a sure fire way to know every upset, nor am I claiming to have the ability to get a perfect bracket. I’m more concerned with how to avoid backing a severely vulnerable horse to make a deep run. How to spot the group of teams that look destined for trouble that wild 1st weekend we all love. This will be a purely data driven set of rules, based on pre-tourney Kenpom numbers heading back to 2005. 16 tournaments worth of data that helps paint a picture on how to spot what teams could face trouble, what teams are impenetrable, and finally put to rest “the best team doesn’t actually win” narrative. Let’s get it.
To start let’s take a look at the most generic pool, the teams that entering the Madness were measuring as the best teams in the Country per Kenpom. So we’ll take the top-10 teams each year, giving us 160 teams and leaving out all of the rest. For context, that pool of teams includes 61 of the 64 1 seeds in our data (the other 3 all failed to make the Final Four) and extends all the way down to a couple 8 seeds (1 of which made a Sweet 16 run). I could go on and on with little nuggets of information, how 25 of the 32 teams to reach the Natty over this 16 years come from this pool and so on and so on, but what really matters is how do we parse down this group into the pretenders and contenders. Is there an analytical approach to identifying which of those top-10 will rise. Of course, that’s why we’re here.
Rule #1: Don’t back the Vulnerable
First, let’s look at those that didn’t make a run. We only have 15 teams out of the 160 that failed to get out of the 1st round (about 9.5%). Kenpom takes each team and gives a Def efficiency score and an Off efficiency score, culminating in an overall efficiency. When you look at the data something jumps out immediately. Many of the teams who suffered a 1st round exit leaned heavily on 1 end of the floor. Recent example, 2021 Ohio St who as a 2 seed lost to 15 seed Oral Roberts. They ranked 4th offensively but 79th defensively (7th overall). They were what I would now categorize as “vulnerable”, and share that unbalanced distinction with other early exiters like 2014 Duke, 2013 Georgetown and 2012 Missouri as just a few of many examples. In fact, when you look at teams who happen to be top-10 in 1 category but sub-50 in the other, the numbers start to paint a picture. Teams with this archetype, who also happen to be top-10 overall are rare, but of the 13 in 16 years only 5 have made it out of the 1st weekend. All 13 happened to be 1-4 seeds. So 5/13 is 38.5% making it to the S-16, as compared to top-4 seeds in general who make it that far at a 64% clip, regardless of where they rank. If you expand that to include teams outside of the top-10 it follows a similar pattern, with 44 total archetypal teams with only 20/44 (45.5%) getting out of the 1st weekend. And yes, the pattern continues as you advance further into the tournament, in fact only 2/44 (4.5%) teams made the Final Four, while top-4 seeds have been 49 of the 64 Final 4 teams over this period. Again, this archetype clearly has struggled against the average and should be considered “vulnerable”. A team that you should not trust to make a deep run, and should never pick to win the whole thing as it has never happened. Only 2012 Louisville and 2013 Michigan have been able to break the mold and make it to the Final 4.
Rule #2: Bet on the Impenetrable
As you’ve likely gathered if you’ve made it this far we are trying to separate the wheat from the chaff, to establish a set of standards for identifying the uncommon amongst the uncommon. Time to shift the focus to the archetype I have deemed as the “impenetrable”. These are the teams you can back with certainty, and no I’m not just saying the four 1 seeds. We’re going to stay with Kenpom and our 16 years of data, focusing on the antithesis of the vulnerable. Teams who were top-15 in both offensive and defensive efficiency. There have been 54 such teams over the last 16 tournaments, none of them lost in the 1st round. This group ranges from 1 seeds all the way down to a couple of 5 seeds, with results similar regardless of seed. Now once we move beyond the 1st round things get interesting. Again, I’m not arguing for perfection, but 45/54 teams made it to the 2nd weekend. That’s an 87% clip, but it does put a small ounce of doubt in these teams. But the fact is there have been 93 top-4 seeds get upset prior to the Sweet 16, only 9 of those were from this group. If you’re trying to identify who is going to escape the carnage of the first weekend, this is the group with the best statistical chance to do so.
Rule #3: Stop trying to predict the Cinderella
We all love the Cinderalla, the Loyola Chicago or Butler magical runs that captivate us all. But, you have to acknowledge there is no possible way to know, for sure, who is going on that run. So why send a 6 seed to the Final Four? Why take a 30th overall team on a deep run? You don’t know. Yes there is going to be one, in fact over the last 16 tournaments only twice have all 4 Final Four teams been in the top-10 of Kenpom (last time 2008). There is always 1 outsider that makes a run, rarely two. The real trick are the “flyers”. Teams who go on runs but don’t fall into the perfect category, they are the hardest to spot, but you’ll need to try because the list of impenetrable teams can be small. The data is more murky on these teams, but when taking out the Cinderella’s (6 seeds and below) and the Impenetrables, we have a list of teams that you can find some trends with. They most often are elite at 1 end of the floor (top-10) and/or on the fringes of our top-15 in both category archetype. Focus on these after that first group, do not be a hero.
So, what this series will be all about is tracking those teams in the top-10, who is falling out who is rising in. Definitively labeling those who are “vulnerable” and those who appear “impenetrable”. All will be ebbing and flowing but the once a week recap to be found here will be a great barometer for what teams are trending which way and how they look heading into March. And eventually it will be measured against the actual tournament, and the actual results. Year 17 of data to come, but this year we enter with a game plan to stop losing your bracket pool or fantasy league to the all the people who don’t even watch the sport.