How Often Does the Best Team Win the Title?

Author:

Robert Robison

Date Published:
November 17, 2022

Dodgers players watch from the dugout during the ninth inning of the team’s season-ending loss to the San Diego Padres in Game 4 of the NLDS on Oct. 15. (Wally Skalij / Los Angeles Times)

On October 15th, something shocking happened: two of the best teams in baseball were eliminated on the same day—in the first round of the playoffs. The Los Angeles Dodgers won 111 games in the regular season, 5 more than the second best team, and were the favorites to win the World Series. The defending champion Atlanta Braves had the best record in baseball over the last two-thirds of the season. Many expected both to make a deep run and compete for the title.

This was a popular discussion topic in the following days as analysts and armchair-managers tried to make sense of this surprising development. But this raises the question: just how surprising was it?

Placing Results in Context

Placing a surprising result in context – i.e., learning its true level of surprise — is a rare discipline in model development and sports analysis alike.[1] You might expect both the Dodgers and Braves losing in the first round to be unlikely to happen by chance, but if we use the betting markets as a guide, it’s not that surprising—only 1 in 7[2]. Each team had less than a two-thirds chance of advancing. While still unlikely, it’s like Christmas falling on a Monday rather than being struck by lightning.

Is there something wrong here? Shouldn’t the best teams have a little more advantage in the first round of the playoffs? Isn’t the point of the playoffs to determine the actual best team? But if the best team is losing more than a third of the time in the first round, their advantage is not large.

When we help clients use modern Machine Learning tools to make sense of their data and drive decision-making, one of the most common mistakes we see initially is that their models are not aligned with their goals. They may want to optimize profit, for example, while their model is optimizing revenue instead. In the MLB case, it appears playoffs are optimizing a different target than finding the best team. But perhaps this is intentional…


    1. An especially powerful method for placing results in context is Target Shuffling, which builds, for a given situation, a probability distribution of “interestingness”; then, the proportion of the distribution scoring higher than your finding is the probability chance could be the cause.
    2. Moneylines can be used to calculate implied win probabilities (e.g., Implied Probability Calculator for Sports Betting Odds (gamingtoday.com)). Moneyline data can be found here: Dodgers and Braves

Excitement vs Accuracy

So why even play the playoffs at all? Doesn’t the regular season champion, based on many more games, better determine the top team? We’ll find the answer to that below, but this overlooks one of the main goals of the playoffs: excitement. TV ratings for postseason games are significantly higher than regular season games, and this increases with the excitement (and randomness) of the individual games.

Take March Madness, the men’s college basketball postseason tournament. A large field with single elimination combines to make it filled with chaos and almost impossible to predict perfectly. The end result:  the widest audience of any sporting event. Clearly postseason tournaments are considering excitement in addition to crowning the best team.

Playoffs across every sport exist along a spectrum from excitement to accuracy. On one end is March Madness, where the chaos is embraced and there seems to be an understanding that the best team doesn’t always win it all. On the other end is the National Basketball Association (NBA), where a couple months of seven game series usually leads to the best team being crowned.

But how can we quantify this?

How Likely is the Best Team to Win?

Let’s quantify excitement vs accuracy by defining the “actual best team” as the one with the highest power rating, then seeing how often it wins the title. To estimate this, we ran 10,000 simulated seasons in 6 major American sports: professional football, basketball, baseball, hockey; college football and basketball.

The first step is to quantify the amount of randomness baked into a single game in each sport. We pulled moneyline data from the Sports Betting Archives, using it to estimate implied win probability for all games in the most recent full season in every sport. This led to the win probability distributions shown in Figure 1 below.

There are a few notable things on this graph:

Baseball (MLB) and hockey (NHL) have the tightest distributions, meaning their single games have the most noise.

Their scoring is rare enough that randomness has a significant effect and the best team beating the worst team isn’t a slam dunk.

Football and basketball match pretty well, with a wider distribution in college because there are more teams.

There’s a dip in the center of the distribution for several of the sports, indicating oddsmakers don’t like to give close-to-even odds in a game.

Perhaps that payoff is not significant, or it would lead to fewer people betting?

Using this information, we can simulate a season with fake teams. The teams are given a power rating to quantify how good they truly are. These power ratings are structured such that the win probabilities in the simulation roughly match the win probabilities in real life, as shown in Figure 2.

For more details on the simulation methodology, see the Appendix.

 

Figure 2 – Implied Win Probabilities (red) compared to win probabilities from simulations (blue)

All that’s left is to draw from these distributions to simulate the regular and post-season in all six sports 10,000 times. In these simulations, we have information we don’t in real life—we know which team is actually the best, since the power ratings represent the true quality of each team.

 

We can use this information to measure how likely the real best team is to win the title and/or finish with the best record in the regular season (Figure 3).

Figure 3 – Odds that the true best team finishes first in the regular season and/or wins the title.

The first thing to notice is that the regular season is much better at finding the best team across every sport. If that was our only objective, we’d be better off following the European Football’s lead and having no playoffs at all.

The second thing that jumps out is the slopes—the lines for NBA, NHL, NFL, and NCAA Basketball look almost parallel, while college football and MLB are outliers for different reasons. College football’s best team wins the title more often than the NFL’s despite having the best regular season record at the same rate. The MLB’s best team, on the other hand, wins the title less often than expected given their regular season performance. This is due to the length of their regular season—162 games—almost twice as many as any other sport—meaning the actual best team more frequently is the best regular season team.

Looking at the title winning chances for the best teams, college basketball is the most chaotic, while the NBA is the best at unearthing the truly best team. This is unsurprising when comparing the postseason formats for all sports: the NBA, NHL, and MLB are the only ones to use series in every round of the postseason, usually 7-game series. This tends to balance out the randomness, giving the better teams the opportunity to recover from an off day. In practice, this makes the NHL and MLB basically identical to the NFL, where each game has less randomness, but where the playoffs are single elimination. The NBA has less randomness on the game level AND 7-game series throughout the playoffs, leading to a more accurate and less chaotic result.

Improving Playoff Format

Our analysis shows that strong surprise at the Dodgers and Braves outcome is not truly justified, but makes sense if we are “trained” by the regular season; that is, the MLB postseason has more chaos than you would expect given how well the regular season standings align with which teams are actually the best. This isn’t necessarily a problem, but if there’s a way to make them more closely match the NHL, for example, then it may make for a more consistent entertainment product.

One way to match the behavior of the two periods is to make the season shorter; this would drop the odds the best team “wins” the regular season and end up closely matching the NHL. Less games means less money, though, so this is unlikely to be adopted. The reason MLB has such a long regular season in the first place is baseball requires less recovery time than most other sports. Games are frequently played on back-to-back nights. This leads to the MLB playoffs being about half as long in days as the NBA and NHL, despite having a similar number of games played. A change that would make the playoffs a similar length to other sports and increase the number of games played is to make all rounds to be 7-, 9-, or 11- game series.

This sounds excessive at first, but when 11-game series are used in every round, the MLB starts to look much more like the other sports, and finishes in a similar amount of time, as shown in Figure 4. At the very least, increasing the first and second round series to 7 games each would be a good start.

Figure 4 – Simulations with 11-game series for MLB in every round

The other outlier is college football. The lack of chaos in college football is way too European for our most American sport. Many have pushed for expanded playoffs in the past, but this analysis shows just how out of line NCAA Football is from other major American sports. Increasing the number of playoff teams from 4 to 32 would make the college football playoffs more in line with professional football, as Figure 5 demonstrates.

Figure 5 – Simulations with MLB 11-game series and 32 NCAA Football playoff teams

It also seems surprising at first glance that the college and professional basketball have such different preferences for excitement vs. accuracy in the playoffs. However, I believe the results start to make more sense when we consider the following:

 

More games = more profit

Larger audience = more profit

More excitement = more profit

Why doesn’t college basketball add series to March Madness then? Maybe 3-game series starting in the Elite 8? I can’t say for sure, but March Madness does seem to have a universal appeal. The overload of chaos in the first weekend briefly holds the attention of many who don’t watch many sports otherwise. If they were to switch to a system that more accurately determined the best team, perhaps they would lose their broader viewership, even if most diehard fans might enjoy it more.

On the other end of the spectrum, the NBA has the least unpredictable postseason. For a stretch in the mid-2010s, every fan could tell you the two teams that were going to play in the championship at any point in the season (Warriors and Cavs). Why does the NBA opt for this? They must prefer more games over excitement. I think you’d see ratings rise if some 7-game series were reduced to 5-game series, but it probably wouldn’t account for the lost profit from having fewer games. I think the NBA would love to get their line down to the NFL/NHL/MLB level, but they can’t do that without sacrificing games or changing something major.

Brainstorming here, one idea would be to change the rules. NBA games have so many possessions that the noise is all but sifted out by the end of the game. They could shorten the game, or play 4-7 “mini-games,” of 6-10 minutes each, where the winner of the most mini-games wins the actual game. This would tighten up their win probability distribution graph and make their postseason like the MLB or NHL. As it is now, it’s a common complaint that there are too many blowouts in the NBA playoffs.

Conclusion

Artificially attaching meaning to random events is an American pastime. At some level, we’re all aware that it isn’t always (or even usually) the best team that’s winning our postseason tournaments. But with just the right balance, we can pretend that it is, and have more fun on the road to get there.

It’s the same reason Poker is more popular than Chess and Roulette—pure skill and pure luck can be fun, but ultimately, we like a balance of the two in our competitions.

And while the tradeoff between skill and luck is intuitive between chess and roulette, it’s not as intuitive when comparing different American sports. Data, simulation, and inferencing is what allows us to make those tradeoffs and quantify what we’re expecting to ‘give-up’ to change how predictable the postseason is.

If the major American sports were to define their objective in crafting a postseason format, it would be something that has many games as possible, while also ensuring that the best teams are the most likely to win, but don’t always win. Accurately defining the question you’re trying to answer is an important discipline in both data science and sports. In this case, it leads us to possible improvements in several sports:

NCAA Football

should increase the number of teams in the playoffs
step image

MLB

should go to a 7-game series format (or more)
step image

NBA

should consider changes to the game that make blowouts less frequent
step image

At the very least, we can all be thankful we have progressed far beyond European Football on the path to the optimal playoff system…just kidding, Europe.

Appendix: Methodology

Simulating full seasons and playoffs for the 6 major American sports leagues was simplified by using the following process:

  1. Download moneyline data for all sports for the most recent completed season
  2. Use the moneyline data to estimate implied probability for every game
  3. Define the single game simulator:
    • Set a power rating for each team, such that the average team’s rating is 0, and the average distance from 0 can vary depending on the sport
    • Calculate the sigmoid of the difference in two power ratings to estimate the probability team A beats team B.
    • Simulate a game by generating a random number uniformly between 0 and 1. If this number is less than the probability team A beats team B, then team A won the game. Otherwise, team B won.
  4. Estimate the average distance from 0 for the power ratings in each sport by calculating the standard deviation of the inverse sigmoid of the implied probabilities and dividing by the square root of two. This second step is because that is the standard deviation of the difference between two normal(0, 1) distributions.
  5. To simulate a single season:
    • Generate team power ratings for all teams in the league using the standard deviation calculated in step 4.
    • Simulate the regular season. Instead of calculating out a schedule such that every team plays every other team, we simulated the seasons by having every team play additional teams drawn from the power rating distribution. This slightly reduces the interdependence in regular season records (e.g., it would technically be possible for all teams to go undefeated), but we believe the actual difference in results is negligible.
    • Simulate the playoffs according to the different playoff structures for each league.
    • Calculate summary statistics such as whether the team with the best power ratings finished (tied for) first in the regular season and/or won the title
  6. Repeat process 10,000 times for each sport and summarize results.