The Burden of Information Part 2: Issues with Data Overload

NBA "Superstar" Boban Marjanovic

This is the second part of “The Burden of Information”.  To read the first part, which discusses how the use of statistics has led to the popularity of football, basketball, and baseball in the United States, click here: Part 1: Why Some Sports are More Popular than Others.


We live in a data-driven society, where almost everything is measured and analyzed.  The same is true for professional sports, where football, basketball, and baseball specifically have used statistics to engage fans, differentiate between players, and add strategic considerations to game planning.  However, when there is near infinite data at our fingertips, is information always a good thing, and do numbers really never lie?

A clear benefit of having many different metrics to measure players by is that we can get a good grasp of a player’s game and determine how good they are through comparisons to other players.  While this may be true, an increase in data also allows for an increase in the likelihood of statistical anomalies.  In basketball, the Player Efficiency Rating (PER) is a widely-used formula which attempts to measure all the in-game contributions of a player on a per-minute basis, leaving one number that stands for how well a player performed.  In the 2015-16 NBA season, MVP Stephen Curry led the league in PER.  Kevin Durant finished second, Russell Westbrook fourth, and LeBron James fifth.  This makes perfect sense and adds credibility to the PER metric as those players are all superstars and arguably the best four players in the league.  However, sandwiched between Durant and Westbrook in third place is someone with not near as much name recognition, and rightfully so: Boban Marjanovic.

So how did Marjanovic, a rookie backup center who had spent nearly a decade playing professionally in Europe, suddenly turn into the third best player in the NBA with the Spurs?  The answer should be obvious: he didn’t.  While Marjanovic was surprisingly efficient whilst on the court, he wasn’t on the court that often.  During the season, he played 508 total minutes, barely eclipsing the minimum of 500 minutes necessary to be eligible for the PER standings.  For comparison, the rest of the top five played over 2,500 minutes.  Marjanovic’s sample size wasn’t just small, but also skewed.  More often than not, Marjanovic played against backups, frequently in garbage time when the Spurs were either far ahead or (less likely) far behind.  Fan-favorite Marjanovic’s numbers, regardless of competition, still show that he’s an underrated big man.  That being said, he’s not close to being one of the league’s best.

Marjanovic’s PER brings about an important point about statistics: numbers can be cherry-picked by anyone to show anything, regardless of the realities of the situation.  Given a haystack and enough time, a needle can be found to mislead the public.  Hypothetically, I could create a formula or find a way to make it look like anyone is a great player if I felt like it.  With so many separate metrics, how do we determine which ones are most important and the best judges of a player’s performance?

That’s the question asked every March of the NCAA Tournament Selection Committee, who determines the 68-team college basketball March Madness field and seeds each team.  Do you choose teams purely by their record?  How about using the Ratings Percentage Index (RPI) or the College Basketball Power Index (BPI)?  How greatly do you value a team’s strength of schedule?  Is a team’s performance against highly ranked teams the best predictor of how good they are?  What is the value of winning a game on the road?  Does a loss look bad even if the team’s star player was injured?  What about a team’s average margin of victory?  Choosing and ranking the best college basketball teams requires sifting through tons of numbers and deciding on which you value the most.  However, everyone has different opinions, which is why annually people are angry over the committee’s final decisions.  Here, we aren’t burdened by a lack of information, but a data overload.

What ever happened to the eye test?  Do we really live in an age where it seems ridiculous to judge the strength of a basketball team by watching the team play?  The use of formulas can lead to teams looking good just because they play strong competition.  Wasn’t it important, at one time, to win games?  Data is an invaluable resource and one that should continue being used.  However, let’s not convince ourselves that some numbers mean far more than they actually do.  It already happens enough.

The sports media uses statistics incorrectly in a few different ways.  One way is through making comparisons between players.  You see it all the time: LeBron vs. Michael Jordan, Kobe vs. Michael Jordan, Tom Brady vs. Peyton Manning, Leo Messi vs. Cristiano Ronaldo, Sidney Crosby vs. Alexander Ovechkin, etc.  People will say things like “Tom Brady threw more touchdown passes this season than Peyton Manning so he’s better” or “Kobe won three championships in his first five seasons, but Michael Jordan had none in his first five seasons.”  On the surface, they sound like reasonable arguments, but in reality, they’re incredibly shallow.  What if Peyton Manning had to drive 99 yards for every touchdown pass, but Tom Brady always got the ball at on his opponent’s one-yard line?  Sure, that’s an extreme example, but wouldn’t it change the way you looked at the stat?  It did take Michael Jordan a lot longer to win a championship than Kobe, but let’s not forget that Shaq was a Laker when Kobe was drafted, while MJ had to wait three years for Scottie Pippen.  Making these huge “who’s better?” comparisons requires looking at the circumstances of each player.

Additionally, have you ever been watching a sporting event and hear a commentator throw out a ridiculous statistic straight out of left field?  Something like, “he hasn’t allowed any home runs the last five times he’s pitched on a Wednesday.”  How is that relevant?  There’s always a data analyst in the background whose only job is to find random stats and put them on the teleprompter.  Are we expected to think there’s something about Wednesdays that makes that pitcher turn from Clark Kent into Superman?  Take a moment to determine whether you’re hearing something of value or just a pure coincidence.

I’m not trying to drive society away from analytics.  In fact, I enjoy researching data and sharing my own findings.  Every year at the MIT Sloan Sports Analytics Conference, speakers share exciting new developments that can revolutionize the way we view sports.  Numbers and technology are giving us the ability to understand sports greater than ever before.  It’s just important not to be blinded by data and stay mindful of potential biases and irrelevant information.

Leave a comment

Your email address will not be published.