BigData is not just size and speed of complex data – it is moving us from information to knowledge
As our Why we focus on spatial data science article discusses, the progress of knowledge fields – history to math to engineering to science and to philosophy – or the individual pursuit of knowledge is based on moving from experiments to hypotheses to computation to now The Fourth Paradigm: Data-Intensive Scientific Discovery. This progression has happened over the course of human history and is now abstracting itself on the internet.
The early 90s web was about content, history, and experiments. The late 90s web was about transactions, security and eCommerce. The 2000s web was about engineering entities breaking silos – within companies, organizations, sectors, and communities. The 2010s web has been about increasing collaborating of communication, work production, and entering into knowledge collaboration. The internet progression is just emulating human history capability development.
When you are ready to move into BigData, it means you are wanting to Answer new questions.
That said, The BigData phenomenom is not about the input of all the raw data and explosion that the Internet of Things is being touted as. The resource sells, and the end product is the consumed byproduct. So lets focus on that by-product – its knowledge. Its not the speed of massive amounts of new complex and various quality data as our discussion on IBM’s 4 V’s focus on.
Its about what we can do with the technology on the cheap that before required supercomputer clusters that only the big boys had. Now with cloud, internet, and enough standards, if we have good and improving data, we ALL now have the environment to be answering complicated questions while sifting through the noise. Its about the enablement of the initial phase of knowledge discovery that everyone is complaining about the "web" right now "too much information" or "drowning in data".
The article on Throwing a Lifeline to Scientists Drowning in Data discusses how we need to be able to "sift through the noise" and make search faster. That is the roadblock, the tall pole in the tent, the showstopper.
Parallelizing the search is the killer app – this is the Big Deal, we should call it BigSearch
If you have to search billions of records and map them to another billion records, doing that in sequence is the problem. You need to shorten the time it takes to sift through the noise. That is why Google became an amazing success out of nowhere. They did and are currently doing it better than anyone else – sifting through the noise.
The United States amazing growth is because of two things – we have resources and we found out how to get to them faster. Each growth phase of the United states was based on that fact alone, and a bit of stopping the barbarians at the gates our ourselves from implosion. You could say civilization. Some softball examples out of hundreds
- Expanding West dramatically exploded after trains, which allowed for regional foraging and mining
- Manufacturing dramatically exploded production output, which allowed for city growth
- Engines shortened time between towns and cities, which allowed for job explosion
- Highway systems shortened time between large cities, which allowed for regional economies
- Airplanes shorten time between the legacy railroad time zones, which allowed for national economies
- Internet shortened access to national resources internationally, which allowed for international economies
- Computing shortened processing time of information, which allows for micro-targetted economies worldwide
Each "age" resulted in shortening the distance from A to B. But, Google is sifting through data. Scientists are trying to sift as well through defined data sensors, link them together and ask very targetted simulated or modeled questions. We need to address the barriers limiting entities success to do this.