[Originally posted in May 2013 – We’re finding this article content is as viable today as it was then, so we’re re-posting]
The “I” in Information Technology is incredibly broad, so why do we first focus on “spatial data” when approaching integrated data science problems? Why is it first in our data areas: Geospatial, Open, Big, IoT? We get this question a lot. Our reason to focus on Geospatial is that it is multi-dimensional. It crosses many different ways of thinking, audiences of varying maturity, progressions, sciences, models, and times.
Analyzing the History of Information
From the perspective of time, we know that technology is developing far too quickly, mostly following Moore’s Law. While the long-term applicability of Moore’s Law has been questioned recently (MIT declared Moore’s Law dead in 2016, then IBM noted it was alive and well again in 2017), this rapid doubling – 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 5096, 10k, 20k, and so on – has seen a 10,000-fold increase in computing and storage capability in fewer than 20 years, with 90% of that increase occurring in the last 10 years alone.
To show this progress in terms of the “I” in “IT”, we have put together a few views on this rapid curve, measuring the progression from data to information to knowledge, and to wisdom (y) over time (x).
1 2 3 4
The fact that we can do more at faster speeds at higher-quality levels means we can continue to increase the complexity of our analyses. However, there appears to be an unfortunate disconnect as we move toward knowledge but not towards wisdom. Its true our knowledge will continue to increase evermore quickly, but what we do with that knowledge as a society is a source of great fear as we move towards this singularity so fast.
Fast is an understatement. This is fast even for logarithmic progressions. So fast it’s hard to express and digest the magnitude of how quickly we’re moving. We’ve already moved from:
- The early 90s experimentation of simply placing history on the web and having general content with loose hyperlinking and web logs
- On to the late 90s conducting eCommerce, doing math and financial interaction modeling and simulations, building product catalogs with metadata that allowed us to relate and predict that if a user liked a quality or metadata in something, they might like something elsewhere
- To the early 2000s engineering social and true-community solutions that began to build on top of relational networks and using semantics and continually sharing content on timelines, tracking where photos were taken as GPS devices began to appear in our pockets
- To the 2010s where today we are looking for new ways to collaborate, to find new discoveries in cloud, and to use the billions and billions of sensors and data streams to create more powerful and knowledgeable applications
Here’s another way to digest this progression:
|Web Version||Time||DIKW||Web Maturity||Knowledge Domain Leading Web||Data Use Model on Web||Data Maturity on Web|
|3.0||2015 and predictable web||Knowledge||+Collaboration||Science||Data as 4th paradigm not||TempoSpatial (goes public)|
|4.0||2020 -2030||Wisdom in sectors||Advancing Collaboration with 3rd world core||Advancing Science into Shared Services – Philosophical is out year||Robot/Ant data quality||Sentiment and Predictive (goes public/useful) – Sensitive is out year|
Circling Back to Where We Began
Taking us back to the beginning, although this blog is lacking specific citations, if you agree that professional spatial mapping originated in the early 2000s, and you agree that it has hit the public, and you understand that spatially-tagging data has passed both these tipping points with the advent of smartphones, map apps, local scouts, augmented-reality directions, and multi-dimensional modeling integrating GIS and CAD with the web, then you can see the data science maturity stage that has the largest impact right now is… Geospatial.
Geospatial data is different. Prior to geospatial, data was non-dimension-based. It had many attributable and categorical facets, but it did not have to be stored in a mathematical or picture form with specific relation to the earth’s position. Spatial data – GIS, CAD, Lat/Longs, has to be stored in numerical fashion in order to calculate upon it. Furthermore, it has to be related to a grounding point. Geospatial is essentially storing vector maps or pixel maps. When you begin to put that together for 10’s of millions of streams, you get a very large, complicated, spatially referenced hydrography dataset. It gets even more complicated when you overlay that with 15-minute time-based data, such as water attributes (flow, height, temperature, quality, changes, etc.). It becomes even more complicated when you combine that data with other dimensions, such as earth elevations, and then need to relate it across the disparate domains of science, speaking different languages to calculate how fast water may flow through a certain containment down a slope after a riverbank or levy collapses.
Before we can get to those more complex scenarios, geospatial data is the next progression in data complexity.