Sports Join the Ranks of Leading Data Science

Gone are the days of jocks and nerds living in two different worlds. You definitely don’t think scrawny computer nerds working with data when you think of sports. Yet it has changed. Data science leaders have historically been led by Finance, Energy, and Intelligence. All well funded areas that need to find small differentiators. Sports are games of inches and also in big money.  We wanted to have some fun and dig into the above three cases in sports – past, present, and future. And hopefully, you will have fun as well, and come away with some very interesting facts.

  1. For instance, in baseball, over 15 years ago, the Moneyball concepts (around since Bill James started his hobby baseball data analysis efforts in the late 70s) became popular in how it began to turn America’s pastime into a true number’s game. 
  2. And in American Football, data that is gathered from physical attributes shown at the NFL Combine like 40 yard run times, bench presses, etc. 
  3. And there have been some fun fails – hey, science is about learning from before. Such as In hockey, there have been obscure efforts to bring data visualization using augmented reality in not-so-successful applications like the infamous glowing puck in hockey (true story). 

Case #1: The NFL – The Combine, the Super Bowl, and Pro Football Stats

The Past – Stat Keeping and the Combine’s Beginnings

It started with tracking stats on players that had been kept since the 1920s. Over the years, other stats arrived to measure a player’s success. The most notable addition being the QB sack, which was introduced in the 1982 season – the same year the NFL Combine was introduced. The Combine, of course, is the big scouting event for college prospects before the draft. The Combine itself changed data science in football, because it added another dimension to what scouts could analyze within a prospect. It used to be that the best way to scout a player was to examine game film and actually attend college games. 

However, the Combine gave pro scouts the ability to gather data on players through physical performance such as the 40 yard dash and bench pressing. Scouting and media emphasize Combine report cards greatly. Scouts and analysts will gush at some of the physical marvels that are put on display there. For data scientists, new kinds of data is fantastic for the profession. The combine, in that regard, was a blessing for data science in sports.

Yet, the combine does not always pick winners – the story of Tom Brady’s combine is legendary. He received an F (12 out of 100) resulting in him being the 199th overall, 6th round, behind 6 other QBs. His Combine assessment is still really bad holding many lowlight records, showing that what scouts measure can result in garbage in, garbage out, and using statistics cannot always find the diamond in the rough.

The Present – Next Gen Stats, Amazon Web Services and Much More

The big one here is the partnership between the NFL and Amazon’s Amazon Web Services (AWS). This partnership allowed data science to continue evolving with the sport, and improve the practice within it. 

This began with the creation of the NFL’s Next Gen Stats, also known as NFL player tracking. NFL player tracking captures real time location data, speed and acceleration for players and plays across the field. Sensors throughout the stadium track tags placed on players’ shoulder pads, charting individual movements within inches. After laying the groundwork in 2013, the NFL introduced the technology in all venues in 2015. There, the NFL began to share stats on broadcasts and in stadiums.

Later, Next Gen Stats partnered with AWS. The partnership sought to improve the entire system through machine learning and data analytics services. Through RFID tags placed on shoulder pads, Next Gen Stats and AWS provide actual predictions on statistics based on data. This includes expected rushing yards, expected yards after a catch and even passing route classification.

Other new uses of data science in pro football can also apply to the Super Bowl, even with predicting wins. The famous FiveThirtyEight has even applied what is known as the Elo rating in chess and soccer to actually predict what team has the edge in the Super Bowl. Considering they gave the Chiefs an edge this year, the process still needs a bit of work. We also can’t forget how fantasy football uses data to project how certain players will do in the upcoming season.

The Future – Changes in Play Calling and New Organizations With Bold Goals

Right now, we’re seeing two major changes that come from closely applying data science to the sport. Since this season, aggressive play calls have actually increased through the analysis of its success rate. Also, Pro Football Focus, an organization that was created to gather data in a sport that ‘desperately needed it,’ strives to improve its own data analytics through the application of new factors. With technology that improves the capture of data and improving immersion of fans already in existence, it would seem that the next logical step is adjustments to data-keeping strategies to further improve the data. Also football teams that will build their game plans around data trends.

Case #2 – Major League Baseball Scouting

The Past – The Dynamic Profession of Baseball Scouting

Where to begin here? Talking about a profession that’s changed a ton over the years, this is one of them. Firstly, there was barely any scouting in the 50s. Then throughout the decades, there was something of an overhaul by implementing the baseball draft. There, scouts had to really analyze young players. There was very little data and a lot of gut instinct.

Then came the 90s, where scouts received something of a grade to work with to give back to the organization they worked for. They called this a “Prospect Formula”,  formed from “ability grades” given by the scout along with extra points given based on the scout’s intuition. For context, Derek Jeter only scored what would qualify him as a “good” prospect.

Then there’s Moneyball. For anyone who has neither read the book nor seen the movie, the concept is quite simple. Using statistical analysis, small-market teams can compete by buying assets that are undervalued by other teams and selling ones that are overvalued by other teams. The best-known Moneyball theory claims baseball undervalues on-base percentage and overvalues sluggers. 

The Present – Lagging vs. Leading Indicators

Leading indicators are sometimes described as inputs. They define what actions are necessary to achieve set goals with measurable outcomes. They “lead” to successfully meeting overall business objectives. That is why we call them “leading”.

By contrast, a lagging indicator measures current production and performance. While a leading indicator is dynamic yet difficult to measure, a lagging indicator is easy to measure yet hard to change. They are opposites. Many compare a lagging indicator to an output metric.

In Baseball…

Baseball clubs have lately focused on “lagging and leading indicators.” Theories like how managers should remove starting pitchers generally after pitch counts exceed 100. Why? Because statistics show a strong correlation between high pitch counts and overuse injuries: pitch counts are a leading indicator to injury. The data shows that a “fresh” arm coming in from the bullpen also generally has better outcomes than a pitcher that has been in the game quite a while. Or, if a greater amount of hits (the leading indicator) correlates with a greater number of runs, baseball clubs would begin to focus more on scouting players that have greater batting averages and “get on base” percentages.

The Future – A Reason to Fear?

There are plenty who would argue that baseball scouting is a profession in danger. Organizations might ask why spend the money on a scout when they can simplify things by examining matters through lagging and leading indicators or the Moneyball Theory. How baseball scouting evolves with data science will ultimately determine its future in the sport. In reality, baseball scouts are being forced to rely more on data analytics to properly analyze their prospects.

Case #3 – Data Science Fails

The Past – The Infamous Glowing Puck

For the more lighthearted portion of the piece, we would like to visit the moments when data science fails in sports. This can be due to a bad application of technology, or a bad case of poor analytics.  Or as we like to call it, garbage in and garbage out. In this case, it will be the former. A lot of hockey fans probably immediately know what we are talking about when we mention the infamous glowing puck. This “glowing puck” was known as FoxTrax, an Augmented Reality system.

Back in 1996, this was Fox Sports’ attempt to help people see the puck better by making it glow on their screens. FoxTrax also implemented systems that could actually change the color based on speed and the power of the shot.

FoxTrax received brutal criticism for being cheesy and distracting. Some even claimed it made hockey look like it had a ‘Mighty Morphin’ Power Rangers budget’. Fox Sports discontinued FoxTrax in 1998 after a long, icy reception. Pun intended. This left behind a very important lesson. No matter how much data science and analytics that one can apply for the sake of immersion, this is a sport for entertainment. Immersion should not distract people from the game.

The Present – The Emotional Factor

Next, we will also talk about factors data cannot predict yet. The big one in sports being the “emotional factor.” Back in the NBA, there was a period of time when the Atlanta Hawks were one of the highest scoring teams in the league. Then they added a new all star to make the team even better. We see this in every sport. People who produce crazy numbers and stats should translate to the best teams in the league. However, the Hawks instead found themselves declining. After exiting the first round in the playoffs a few years later, the new guy left. It soon came out that he disliked his role in the offense, the new coach, etc.

What do we take away from this experience? Good players who produce crazy stats don’t automatically equal championship teams. There really are factors that data science cannot account for yet. The big one being the emotional factor.

The Future – More AR and Maybe VR? We’ll See How That Turns Out

Theoretically, with the infamous failure that was FoxTrax and Augmented Reality in hockey, there would be some hesitation to try again with AR. However, that is not the case. Many leagues are working to incorporate AR into their sports now. We are already seeing some success with first down lines being shown in football. Also, overlaying ads on ice near the blue line. We can even track a baseball in flight in replays. Ironically, we learned that lesson from the glowing puck.

The big thing right now are AR glasses. The MLB plans to integrate these glasses with their Hawk-Eye stats tracking system. The hope is to enhance fan experience through the use of real-time data provided through augmented reality. The question now is, if the MLB implements this, will it be distracting like FoxTrax? Or will the glasses look so ridiculous or uncomfortable that nobody will want them on? Thankfully this is a bit further into the future. Plenty of time for trial and error, much like any sort of “data science experiment.”

So if you like sports, are you excited about greater immersion? Or are you just excited about data driven concepts in sports? We are excited about both because at Xentity, we love innovative practices. We strive to put the “I” back in “IT” and “GIS” and look forward to seeing how professional sports manages to do just that.