Why Advanced Metrics Don’t Always Work

Dylan Snyder

This past weekend I had the pleasure of traveling to Boston to attend the MIT Sloan Sports Analytics Conference with some of the best and brightest in the world of sports media and statistics. While I was there I noticed some interesting trends that really surprised me.  First and foremost there was a noticeable lack of baseball oriented analytics panels and presentations. With the popularity of Michael Lewis’ Moneyball I anticipated baseball being a main attraction.  You hear so much about wins above replacement (WAR) and other such baseball metrics that are now standard vocabulary for even casual fans that I would presume that there would obviously be some new breakthrough in how to analyze baseball players. Then came the statement that silenced the room. Football Outsiders Aaron Schatz said, “Baseball analytics are exhausted, everyone has the same data and the same ranking systems.  The ideas are different, but there is no more room to grow.” Then I had to ask myself, “Why did baseball get exhausted before sports like football, hockey, and basketball and are still struggling to get the analytics they need in order to better analyze not only free agent players, but also draft picks and minor leaguers?” It quickly dawned on me that the more players on the field, court, or ice the harder it is to properly analyze the value of single player. The sport of baseball is unique in the sense that baseball is a unique sport where each player is performing by himself, while on offense against the rest of the opposing team.  It is solely the batter and the pitcher until the ball is in play.  Batters don’t have to worry about a backdoor cutter filling where they planned on driving, a player setting a poor screen enabling the defense to properly slide, or a pulling guard tripping over a downed lineman and causing a massive pile up. Each sport faces it’s unique challenges when trying to rank players based on how valuable they are to a team.

Part of the problem for a sport such as hockey as expressed in the runner up research paper ” Total Hockey Rating (THoR): A comprehensive statistical rating of National Hockey League forwards and defensemen based upon all on-ice events” measured how hits, shots, blocked shot, and several other aspects effected the probability a goal being scored for either team and then ranked players on the ability to achieve positive statistics and avoid negative ones.  However, the authors admitted that THoR is not a perfect metric. For example a blocked shot was seen as a negative play, but obviously blocking an opposing players shot was a good idea as it prevent any chance of scoring.  This had to be seen as a negative play however because the odds of scoring for the team that had the shot blocked were still higher than the team that blocked said shot. Essentially, if a defender dodges a shot that scores, his THoR would remain the same. If a player simply got out of the way of the shot, and the shot happened to score his THoR would remain the same, but he failed to stop a goal from occurring.  This also ties in with what former Magic coach Stan Van Gundy said, “Defense will be impossible to quantify until you guys [analytics people] know our systems.” If the hockey coach instructed his defenders to get out of the way of shots to make better sight lines for his goalie, the defender did the right thing, but if the coach wants defenders to eat shots to cover up for an inept goalie, this player should be penalized in his ranking.  We also don’t know if the defender was late in getting to his mark because he was going after a loose puck or making up for another players mistake. There is too much going on to have a real single metric to measure the player’s value.

Another sport where some advanced metrics have had a lot of success is the offensive side of basketball.  The ability to determine who is worth playing on the offensive side of the court is a valuable asset to any coach or GM. Using what amount of hot and cold zones you can see how effective a player is when trying to score. However, basketball players don’t specialize in either offense or defense, as they do in football. Players must be capable of both scoring and defending at a high level.  In a panel on randomness and luck Houston Rocket’s GM Daryl Morey was pressed about why they gave up on Steve Novak, one of the best offensive shooters in the NBA. He responded, “We didn’t! We still…uh… his defense.” Looking at advanced metrics, Novak was one of the most efficient scorers in the league, and would have been a valuable asset to any team. Novak lacked in defense, and his inability to even play good team defense meant he and the Rockets had to part ways. Going back to the very outspoken Stan Van Gundy, about another player known for being a poor defender, J.J. Reddick, who played for Van Gundy in Orlando.  Van Gundy admitted he knew that Reddick was smaller and less athletic than many of the other shooting guards in the NBA.  Taller more athletic players could shoot over the top of Reddick without to much trouble, Van Gundy didn’t need the numbers to tell him. Much to the surprise of the audience Van Gundy came out swinging in his defense “Look, J.J. might not be the best one on one defender, but that doesn’t mean you can’t have a good defense with J.J there. He [Reddick] will never miss a rotation; he will be where you need him doing what you tell him to do. I’d rather have him doing what I say than some athlete who misses his rotations constantly and messes everything up giving teams open layups.”  This is where new school numbers are trumped by old school knowledge. There will never be a scenario where a team can be built by a computer, and models will never provide a coach with optimal lineups. A model can’t tell you whether or not two players personalities are going to mesh well. and once again, the more people you have to get along with the harder and harder it will be to create scenario’s where models will be able to replace conventional wisdom.

Then there is football, considered by many to be the ultimate team sport where one player’s success is almost completely based off of the play those around him.  Quarterbacks can’t be great unless they have time to throw, safeties can play the way the should unless they trust the front seven playing in front of them.  The best example that came to the forefront during the football analytics panel hosted by, several NFL personnel men and previously mentioned Aaron Shatz, was trying to evaluate the Seattle Seahawks wide receivers.  In conventional football wisdom a quarterback has to have the ball out of his hands in about three seconds, meaning depending on his release has to start his throwing motion in about two and half.  Russell Wilson of the Seahawks however holds the ball for an average of 3.64 seconds.  This extra time gives his receivers the ability to make and extra cut, or dig into their break harder and know that ball still has a chance at getting there. How can you evaluate a team where the receivers has 3.64 seconds to get open against a team like the Eagles where the offensive line is forcing the QB to get rid of the ball even quicker than three seconds?  At the 3.5-second mark with the two teams you have Wilson stepping up in the pocket ready to launch a bomb to Golden Tate where as Michael Vick/ Nick Foles is lying on his back with a 300 lb lineman on top of him.  The result on the play leads to the dance of the Sugar Plum Fairy for Seahawks fans, but broken ribs for Vick. On the other end Tate looks like a hero for hauling in a thirty-yard pass, but DeSean Jackson is left holding his head in frustration. Conventional and advanced statistics say that Tate is the better receiver here, but if you ask personnel directors who is the better receiver it will most likely be pretty heavy in favor of Jackson. This inability to isolate the reason for the outcome of a play, combined with the difficulty of coming up with some of sort of “if everyone played with the same ten players” benchmark football will continue to have difficulty ranking players.  People like Shatz have come up with ways to measure how a team is doing, and attempted to show how individual players are performing regardless of the rest of their team, but the difficulty in predicting how a player will do in the future is still an issue.  We saw this first hand when Economist/ Data Analyst Nate Silver predicted a Seahawks vs. Patriots Superbowl.  Both teams were tops in their conference in the advanced metrics both Football Outsiders and Silver were looking at.  However the numbers didn’t account for Pete Carroll giving Matt Bryant another chance on a game winning field goal, or Joe Flacco making one the best postseason runs in the history of the NFL. A sample size of only sixteen games is too few to be able to get an accurate reading, that combined with the difficulty of separating one player for the rest will continue to plague metrics in football.

With the world of baseball getting over-saturated with advanced metrics to the point where they have become a tool to become safer drafters, they still don’t really let poor teams compete with rich ones. Rich teams can just spend more money on the most effective players. Of course some teams still will pay for star power at the expense of batting average, but it has become about as scientific as it sports will allow.  This is due to baseballs unique style of play however, and until there is a system of complete and total isolation in the other sports they will maintain some of the “what were they thinking” magic that has been a part of sports since the days they began.