Happy Star Wars Month! 

It’s May, which makes it Star Wars Month. May the 4th be with you readers. At Xentity, we consider ourselves data geeks, but a couple of us are simply geeks in general. Our very own Matt Tricomi once noted in a tweet that Darth Vader’s first lines in A New Hope pertained to some very important data: a technical readout of a certain space station. Meanwhile, one of our staff binge watched the entire saga one weekend (frankly, he is the biggest geek of us all).

Afterwards, he told us something very interesting. Vader was not the only one to recognize the importance of data. Data seemed to become a very important plot point in quite a few moments in Star Wars.   With that in mind, let’s take a look at 5 crucial moments where data was important in a galaxy far, far away. For this piece, we will be going in chronological order from Episode I to Episode IX if necessary. 

#5 The Midichlorian Count

With how divisive the prequels tend to be, let’s not get into the debate of whether the concept of “midichlorians” ruined the mystique of Star Wars. Instead, let’s focus simply on what went into the moment in Episode I, The Phantom Menace, where Qui-Gon Jinn asked for a midichlorian count from young Anakin Skywalker’s blood. Obi-Wan Kenobi discovered, upon analysis, that Anakin’s midichlorian count was the highest ever recorded, even higher than Yoda. This lent credence to Qui-Gon’s theory that Anakin was the “Chosen One.”

Other entries on the list are larger than this, by contrast. However, it still brings up a couple of very interesting concepts to consider. First, there’s the actual midichlorian count that Obi-Wan goes through. He basically was cleaning and modeling data to extract meaningful information and to help Qui-Gon make what we would call a ‘data-informed decision’ looking from the outside. In other words, Obi-Wan was actually doing your basic data analysis. He took the data he received from Qui-Gon’s blood test and extracted the very crucial piece of infor that Anakin’s count was over 20,000, the highest ever recorded.

The fact that Obi-Wan points out how Anakin’s count was the highest implies one very important fact: that there is some sort of easily accessible database the Jedi possess that has records on their midichlorian counts. Anakin’s count was not only entered into a database, but that count surpassed every other entry in the database. This scene was brief, but it not only showed data analysis. It also showed the use of a database that provided more credence to Qui-Gon’s theory. It’s as though even pivot tables in Excel are alive and well in the galaxy far, far away.

#4 – The Planet That Was Missing From The Archives

“Impossible! Perhaps the archives are incomplete?”  Big Star Wars fans clearly remember that particular line from “Episode II, Attack of the Clones”. Of course, the prequels in general were among the most quotable of the Star Wars movies (not always for the right reasons, unfortunately). However, it is not just a popular internet meme. It is also one of many moments where data became a crucial plot point.

For context, Obi-Wan Kenobi is visiting the Jedi Archives trying to find a planet that has a lead on a bounty hunter the Jedi are after. A planet called Kamino, which we later discover is the birthplace of the Clone Army the Jedi later use. However, Obi-Wan soon discovers that Kamino is nowhere inside the archives. Of course, the archives have to be incomplete, right? That itself is not just the biggest issue. The issue here is that the Jedi Archives were suffering from a lack of data veracity, and clearly data security, because the planet itself had been removed from the archives by a traitor to the Jedi Order.

Veracity and Security

Data veracity is the degree to which data is accurate, precise and trusted. We often view data as reliable and certain. However, the reality is that data is often imprecise, uncertain and therefore unreliable because data is often incomplete and incorrect, as we see in “Attack of the Clones”. Data veracity becomes especially important in big data, which deals with extremely large sets of data. Like, you know, all the planets and systems in a massive, ever-expanding fictional universe. Because the Jedi Archives’ data veracity was lacking, Obi-Wan’s search for the correct information had stalled.

Data security is a big issue here as well. Count Dooku, who had betrayed the Jedi Order to form the Seperatist movement that would oppose the Jedi and the Galactic Republic, removed the data on the planet. If the Jedi Order had placed better security measures on their own archives, or if the archives themselves were not constantly backed up (a basic concept in using data, nowadays), Kamino would not have simply vanished from their archives. But, the prequel trilogy would not be about the Jedi Order falling to an empire if the Jedi were competent. This lack of data security, data veracity, and the failure to do something as basic as back up your data is just one of many failures the Jedi had in the prequels (not just the obvious ones).

#3 Secure Your Data Schematics, Empire!

We all remember how the original Star Wars began. The Empire was chasing down a small rebel ship, because that rebel ship had the stolen plans to the Death Star. Of course, we did not know how it was done for years until Rogue One was released. And what did we find? Well, we found out that apparently hacking is still a big problem in the galaxy. The theft of the Death Star plans was quite heroic. But the event itself also exposed the Empire’s own “IP” so to speak and flaws in its data security.

This one in particular will hit close to home, because we all know how much of a pain it is whenever there’s a large-scale hack of a well known organization or system. There’s a very good reason why data and cybersecurity are becoming very important jobs in an ever expanding technological world. The empire did not learn that in Rogue One. Throughout the “Rebellion vs. Empire” films, one thing is quite constant. The Empire is terrible at cybersecurity. Old clearance codes that still ‘check out’ in Return of the Jedi, data leaks (intentional or not), and data that can easily be hacked into and stolen.

More Bad Data Security

You have to constantly monitor and update data security, or hackers can easily find vulnerabilities and attack them. Which is exactly how the rebellion got their hands on the plans. They found a hole in their security and hacked into it. And then they proceeded to use what was basically the Star Wars version of the Cloud, to airdrop their schematics to the Rebellion. Can you imagine how different the movies would have been if the Empire was a bit more competent in securing their data (nevermind the numerous other areas they could have been more competent in)? Speaking of which…

#2 The Death Star Plans

This brings us to “A New Hope”, the movie that started it all. The movie that introduced us to memorable and beloved characters. Also, the idea that even the mystical and spiritual had its place in a sci-fi universe. It was also the first of the Star Wars movies that showed us how good data can make or break an operation, whether it was the Empire’s operation to rule the galaxy with an iron fist, or the Rebellion’s operation to destroy the Death Star and free the galaxy.

We have already touched on the Empire’s criminally irresponsible data security and how it cost them dearly. So we will focus on what that data did for the Rebellion instead and what data concepts were demonstrated. This is another great demonstration of data analysis, like the aforementioned midichlorian count. We all remember the big ‘prep’ scene, where they showed Luke and the other rebel pilots a full readout of the Death Star. And then they see the very crucial piece of information: that two-meter exhaust shaft.

The One in a Million Shot

You know the one. The one where a precise torpedo blast will set off a chain reaction and blow up the entire Death Star. When you really stop to think about it, there are some unsung heroes here. Whoever analyzed the data extracted that key piece of information that allowed Luke to blow up the Death Star. Basically, having a collection of data is one thing. You have to be able to extract the information you need  for it to be of any use. The data analysts of the rebellion? They are the unsung heroes of this whole story.

Honestly, it was no wonder why even Vader was so frantic in getting the plans back. He knew that in this technologically and intellectually advanced universe, they could find a weakness with those plans. We said it before. The right kind of data can make or break an operation. And other concepts such as data security can as well. The Empire’s data security was clearly awful if a ragtag rebellion was able to hack in. Consequently, the Rebellion had access to exactly the data they needed.

#1 Where In The Galaxy Is Luke Skywalker?

The world was certainly shocked when the first episode of the new trilogy began with: “Luke Skywalker has vanished” in the opening crawl. And then we learned the details. A map contained Luke’s location. To avoid being found, Luke broke it into pieces. The Resistance came across most of it in the beginning of the movie. Then we see at the end that R2 had the missing piece: which was Luke’s exact location.

So, we’ve got a couple of interesting data concepts in this scenario worth visiting. First, we revisit the data veracity that was first seen in our “Attack of the Clones” moment. With the data incomplete, it was unreliable to use in the Resistance’s goal: finding Luke Skywalker. Furthermore, one of our technology focuses is geospatial data and solutions, so the idea of an incomplete map really spoke to us. One of our major clients is the US Geological Survey (USGS). Could you imagine if the USGS provided an inaccurate or incomplete map of an area within the US? Which brings us to a statistics concept called “Missing data” (which is also considered a data veracity concept, but this idea is large enough to warrant its separation).

And This Matter’s Because?

Missing data occurs when you store no data value for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions  drawn from the data. Again, the data was incomplete, and therefore no proper conclusion can be made of Luke’s location. So what do we learn from this? Make sure your data is complete and conclusive. Otherwise you might end up in Tatooine when you meant to go to the moons of Endor.

The Big Takeaway

These little things that show up in Star Wars provided us with a fantastic insight. It served as proof that data is everywhere and important to our lives, just like we so often claim. Data is everywhere, even in pop culture like Star Wars. And not only is data everywhere, but it was in moments extremely crucial to the plot. Missing archives, stolen plans, missing map pieces, midichlorian counts, all of these served to advance a fantastic narrative. Just like how data can advance an organization’s interest in today’s age. At Xentity, our entire goal is to help improve your organization through data consulting. To continue following the terminology we have been using in this blog, think of it this way: we could be your Obi-Wan, young Padawan.