[Originally posted in May 2013 – We’re finding this article content is as viable today as it was then, so we’re re-posting]

The “I” in Information Technology is incredibly broad, so why do we first focus on “spatial data” when approaching integrated data science problems? Why is it first in our data areas: Geospatial, Open, Big, IoT? We get this question a lot. Our reason to focus on Geospatial is that it is multi-dimensional. It crosses many different ways of thinking, audiences of varying maturity, progressions, sciences, models, and times.

 

Analyzing the History of Information

 

From the perspective of time, we know that technology is developing far too quickly, mostly following Moore’s Law. While the long-term applicability of Moore’s Law has been questioned recently (MIT declared Moore’s Law dead in 2016, then IBM noted it was alive and well again in 2017), this rapid doubling – 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 5096, 10k, 20k, and so on – has seen a 10,000-fold increase in computing and storage capability in fewer than 20 years, with 90% of that increase occurring in the last 10 years alone.

 

To show this progress in terms of the “I” in “IT”, we have put together a few views on this rapid curve, measuring the progression from data to information to knowledge, and to wisdom (y) over time (x).

                               1                              2                              3                            4

Xentity DIKW Progression

If we take only public web content, data on the web is at the beginning of the curve. Early data products have had the longest time to mature, about 10-15 years, and can only improve as they mature further, reaching 25 years and beyond.

 

As of the 2010s, we are now in the information period, yet technological advance has us moving too quickly into the knowledge period. Think how dependent we are on the Internet today. Remember how often you were on the web in 1998 compared to 15 years later. The web fits right in your pocket now. Now, think how AI, IoT, and new interfaces can and will change everything in another 15 years.

 

Taking in the progression of this high-level concept, we have mapped out four different impacts:

 

  • Common knowledge domain progression
  • Scientific capability progression
  • Adoption progression
  • Device progression

 

Four Impacts on the Progression from Data to Wisdom

 

Our first progression traces a timeline of web uptake by different knowledge domains. It begins with the Internet being used primarily to search historical records, then for computational math in the 90s. Now it has moved into hard science analytics, personalized signals, and models. We are now only a decade or so away from sentiment and sensitive capabilities, raising profoundly complex socio-ethical and philosophical discussions once only the realm of science fiction and computer science geek talk.

 

This means the 40 year reign of MIS / information systems over the computing world is ending, if it hasn’t already ended. Data-driven enterprises using analytics to support decisions have taken over. It’s now a race for organizations to rapidly move towards knowledge first.

 

This is why GIS has been shifting to geospatial data products and services over the last decade. Knowledge is taking the lead over records, business process, and compliance processing.

The second progression tracks the progress of scientific capabilities. Science used to be experimental, hypothetical, and conducted in laboratories. A generation ago it moved into computational models. Now the expectation is that those models need to be fed big, fast, spatial data from records, taxonomies, sensors, and shared research. This is not an information system; this is a web of atomic data. The fact that communities of end-users are demanding improved data flow, better supply chains, and advanced experiences to address these gaps and move beyond MIS models speaks volumes about how data qualities are severely lagging behind technology in investment. Again, this is due to technology requiring the same level of investment every two years to double production capabilities, whereas data is currently still linear and providing an analogous level of investment return.

The third progression tracks public adoption of the web. What used to be squirreled away in government skunkworks or the proprietary research data moats of financial, energy, and marketing organizations is now in public collaboration environments and python notebooks are moving into event-driven architectures. As each capability is adopted, it demonstrates how the leading industries have invested, not unlike the progress of human intelligence over the history of humankind. Originally it was about moving content to digital, then secure, information-exchange commerce transactions. Next, communities formed online, overcoming geographic limitations. Finally, those communities found niche ways to collaborate together. This tracks with the offline progression from writing to payment to trade to industry over thousands and thousands of years, yet on a Moore’s Law scale of a few decades.

Finally, the fourth progression tracks the adoption of consumer devices. At this point, computers have shrunk to the point where they are getting ever so close to data itself. From powerful phone and tablet computing, to process sensor and field collection with direct communication, to collaborative and service applications, to immensely flexible and scalable computing available to anyone with a credit card, to fat pipes bringing data to world-class computing (aka, the cloud). The computer is no longer confined to a data center or desktop; nodes are truly everywhere. The next 10-15 years will be the decade of the interface, as physical barriers no longer confine form factor. We are only beginning to experiment with adjustments to reality – augmented reality, virtual reality, beacons, and online avatar MMORPG worlds. What these new interfaces will become, for now the creatives of the world are only teasing us – smart home, smart city, holographic interface, IoT lenses (glasses or embedded), even singularity?

 

The Challenge of Perpetual Acceleration

 

Returning to the purpose of this blog post – the progression of these capabilities has brought us to a point where we can blend what, when, and where – TempoSpatial data – into the flourishing cognitive and language world.

 

This isn’t just our theory.

 

Radar Networks put together this visual tracing the progress of the Internet through the different web eras. Web 1.0 was websites or content and early commerce sites. Web 2.0 augmented the web community with blogs and began to link collaboratively built information with wikis. Web 3.0 is ushering in semantic direction and building integrated knowledge.

radarnetworkstowardsawebos

 

The fact that we can do more at faster speeds at higher-quality levels means we can continue to increase the complexity of our analyses. However, there appears to be an unfortunate disconnect as we move toward knowledge but not towards wisdom. Its true our knowledge will continue to increase evermore quickly, but what we do with that knowledge as a society is a source of great fear as we move towards this singularity so fast.

Fast is an understatement. This is fast even for logarithmic progressions. So fast it’s hard to express and digest the magnitude of how quickly we’re moving. We’ve already moved from:

  • The early 90s experimentation of simply placing history on the web and having general content with loose hyperlinking and web logs
  • On to the late 90s conducting eCommerce, doing math and financial interaction modeling and simulations, building product catalogs with metadata that allowed us to relate and predict that if a user liked a quality or metadata in something, they might like something elsewhere
  • To the early 2000s engineering social and true-community solutions that began to build on top of relational networks and using semantics and continually sharing content on timelines, tracking where photos were taken as GPS devices began to appear in our pockets
  • To the 2010s where today we are looking for new ways to collaborate, to find new discoveries in cloud, and to use the billions and billions of sensors and data streams to create more powerful and knowledgeable applications

Here’s another way to digest this progression:

Web VersionTimeDIKWWeb MaturityKnowledge Domain Leading WebData Use Model on WebData Maturity on Web
.9early 90sDataContentHistoryExperimentalLogs
1.01995+InfoHistoryExperimentalContent
1.11997MathExperimentalRelational
1.21999+CommerceMathHypotheticalMetadata
1.32002EngineeringHypotheticalSpatial
2.02005+Knowledge+CommunityEngineeringComputationalTemporal
2.12010sEngineeringComputationalSemantic
3.02015 and predictable webKnowledge+CollaborationScienceData as 4th paradigm notTempoSpatial (goes public)
4.02020 -2030Wisdom in sectorsAdvancing Collaboration with 3rd world coreAdvancing Science into Shared Services – Philosophical is out yearRobot/Ant data qualitySentiment and Predictive (goes public/useful) – Sensitive is out year

Circling Back to Where We Began

Taking us back to the beginning, although this blog is lacking specific citations, if you agree that professional spatial mapping originated in the early 2000s, and you agree that it has hit the public, and you understand that spatially-tagging data has passed both these tipping points with the advent of smartphones, map apps, local scouts, augmented-reality directions, and multi-dimensional modeling integrating GIS and CAD with the web, then you can see the data science maturity stage that has the largest impact right now is… Geospatial.

Geospatial data is different. Prior to geospatial, data was non-dimension-based. It had many attributable and categorical facets, but it did not have to be stored in a mathematical or picture form with specific relation to the earth’s position. Spatial data – GIS, CAD, Lat/Longs, has to be stored in numerical fashion in order to calculate upon it. Furthermore, it has to be related to a grounding point. Geospatial is essentially storing vector maps or pixel maps. When you begin to put that together for 10’s of millions of streams, you get a very large, complicated, spatially referenced hydrography dataset. It gets even more complicated when you overlay that with 15-minute time-based data, such as water attributes (flow, height, temperature, quality, changes, etc.). It becomes even more complicated when you combine that data with other dimensions, such as earth elevations, and then need to relate it across the disparate domains of science, speaking different languages to calculate how fast water may flow through a certain containment down a slope after a riverbank or levy collapses.

Before we can get to those more complex scenarios, geospatial data is the next progression in data complexity.