[Originally posted in May 2013 – We’re finding this article content is as viable today as it was then, so we’re re-posting]

The I in Information Technology is so very broad. Given, why is our first integrated data science problem focus on spatial data? What is it first in our data areas: Geospatial. Open. Big. IoT. We get asked this a lot. Our reason, like Geospatial, is multi-dimensional. Our reason spans different ways of thinking, audiences, maturity, progressions, science, modeling, and time:

First, from a time perspective, we all know that technology is going way too fast – mostly following “Moore’s Law”. While this law has had people on edge – MIT declared in 2016 Moore’s Law was dead, then IBM noted in 2017, its back on. Nonetheless this rapid doubling – 2,4,8,16,32,64,128,256,512,1024,2048, 5096, 10k,20k, and so on – in less than 20 years has seen 10,000-fold increase in computing and storage capability, with 90% of that increase in the last 10 years alone.

Xentity DIKW Progression

 

To show this progress in terms of the I in IT, we have put together a few views on this rapid curve such in terms of the data, info, knowledge, wisdom progression (y) over time (x).

If we take only public web content, data on the web is at the beginning of the curve. Given, early data products have taken the longest period to mature- about 10-15 years. And data can only get better as it matures into being 25 years old on the web.

As of the 2010s, we are in the information period now, yet technology advance has us moving almost too quicklyqand uickly  into the knowledge period. Just see how much more scientific data visualizations, and dependence we are on the internet. Just think how much you were on the web in 1998 compared to 15 years later – IT IS IN YOUR POCKET now. Think how AI, IoT, and new interfaces will change everything in 15 more years.

whydata5

 

Taking that high level concept progression in, we have taken four different impacts of this:

  • common knowledge domain progression,
  • sceintific capability progression,
  • adoption progression,
  • and device progression.

Our first progression shows from first using the internet for records searches generally of historical context. Next to   mat computational in the 90s. Now its moved to hard science analytics, personalized signals, and models. We are now only a decade or so away from sentiment and senisitive capabilities which brings incincieeibly challenging socieoethic and philosophical discussions once left to sci-fi and computer science geek talk.

This means the 70s into 2010 reign of MIS / information systems ruling computing world is taking second fiddle soon if not in some sectors already. Data-driven enterprises to support decisions has taken over. Its now a race for organizations to move towards knowledge first.

This is the reason GIS has been shifting over last decade to geospatial data products and services. Knowledge is taking lead over records, business process and compliance processing.

In the second progression above, Science used to be experimental and hypothetical in labs. It moved a generation ago into computational models. Now the expectation is those models need to be fed by big, fast, spatial data from records, taxonomies, sensors, and shared research. This is not an information system – this is a web of atomic data. The fact that end communities of use are demanding the data flow, supply chain, and experience to address the gaps and move beyond MIS models speaks volumes that data qualities are severely lagging the investment level in technology. This again is due to technology requiring the same lvel of investment every two years to achieve double production capabilities. Where data is still currently linear and analog level of investment return.

In the third progression, what used to be in government skunkworks only or proprietary research data moats of financial, energy, and marketing organizations is now in public collaboration environments and python notebooks are moving into life event driven architectures. As each capability is adopted, it shows how the leading industries have invested not unlike the progress of human intelligence over the history of humankind. First it was about moving content to digital. Once secure, information exchange commerce transactions. Then communities crossing previous geographic limitations formed online. And finally, those forms took niche ways to collaborate together. This takes on the same offline progression over thousands and thousands of years of writing to payment to trade to industry – yet on a Moore’s law scale of a few decades.

Finally, in the fourth progression, the computing has shrunk to enable the computing to be brought ever so close to the data – from powerful phone and table computing to process sensor and field collection with direct communication to collaborative and service applications to immensely flexible and scalable computing available to anyone with a credit card with fairly fat pipes to bring data to world-class computing (aka, the cloud). This means, the compute is no longer confined to a data center or desktop, but nodes are truly everywhere. This brings on the future 10-15 years to be the decade of the interface as form factor is no longer limited to confining physical barriers. We are only experimenting with adjustments to reality – augmented reality, virtual reality, beacons, online avatar MMORPG worlds. What these new interfaces will be, the creatives of the world are only teasing us where it may land – smart home, smart city, holographic, new lens (glasses to embedded), singularity?

Bringing it back to the purpose of this blog post – this progression of capability brings us to how we can blend what, when, and where – TempoSpatial – into the flourishing cognitive and language world.

This isn’t just our theory.

RadarNetworks put together the visual of progressing through the web eras. Web 1.0 was websites or Content and early Commerce sites. Web 2.0 raised the web community with blogs and the web began to link collaboratively built information with wikis. Web 3.0 is ushering in the semantic direction and building integrated knowledge.

radarnetworkstowardsawebos

 

Even scarier, Public Web Content progression lags several business domains, but not necessarily in this leading order: Intelligence, Financial, Energy, Retail, and Large Corporate Analytics. Meaning, this curve reflects the Public maturity, and those other domains have different and faster curves.

The recent discussions on intelligence analysis linking social/internet data with profile, Facebook/Google Privacy and use for personalized advertising, level of detail SalesForce knows about you and why companies pay so much for a license/seat, how energy exploration is optimizing where to drill in some harder to find areas, or the absolute complexity and risk of the financial derivatives as the world market goes – these technologies usually lag in how we integrate public content for googling someone, or using the internet to learn more and faster. Reason: Those do not make money. Same reason why the DoD invented the internet – it was driven by security of the U.S. which makes money which makes power.

So, that digression aside (as we have been told “well, my industry is different”), the public progression does follow a parabolic curve that matches Moore’s Law driving factor in IT capability – every 2 years, computing power doubles in power, at same cost (paraphrasing). The fact that we can do more faster at quality levels means we can continue to increase our complexity of analysis in red. And there appears to be a stall not moving towards wisdom, but as we move toward knowledge. Its true our knowledge will continue to increase VERY fast, but what we do with that as a society is the “fear” as we move towards this singularity so fast.

Fast is an understatement, very fast even for logarithmic progression as its hard to emote and digest the magnitude of just how fast it is moving. We moved from

  • The early 90s simply placing history up there and experimentation and having general content with loose hyperlinking and web logs
  • to the late 90s conducting eCommerce and doing math/financial interaction modeling and simulations and building product catalogs with metadata that allowed us to relate and say if a user found that quality or metadata in something, it might like something else over here
  • to the early 2000s to engineering solutions including social and true community solutions that began to build on top of relational and the network effect and use semantics and continually share content on timelines and where a photo was taken as GPS devices began to appear in our pockets
  • To the 2010s or today where we are looking for new ways to collaborate, find new discoveries in cloud, and use the billions and billions on sensors and data streams to create more powerful more knowledgable applications

Another way to digest this progression is via the table below.

Web VersionTimeDIKWWeb MaturityKnowledge Domain Leading WebData Use Model on WebData Maturity on Web
.9early 90sDataContentHistoryExperimentalLogs
1.01995+InfoHistoryExperimentalContent
1.11997MathExperimentalRelational
1.21999+CommerceMathHypotheticalMetadata
1.32002EngineeringHypotheticalSpatial
2.02005+Knowledge+CommunityEngineeringComputationalTemporal
2.12010sEngineeringComputationalSemantic
3.02015 and predictable webKnowledge+CollaborationScienceData as 4th paradigm notTempoSpatial (goes public)
4.02020 -2030Wisdom in sectorsAdvancing Collaboration with 3rd world coreAdvancing Science into Shared Services – Philosophical is out yearRobot/Ant data qualitySentiment and Predictive (goes public/useful) – Sensitive is out year

Let’s Pause

So, taking it back to the “now”, though this blog is lacking the specific citations (blogs do allow us to cheat, but our research sources will make sure to detail and source our analysis), if you agree that spatial mapping for professional occurred in early 2000s and agree now that it has hit the public and understand that spatially tagging data has pass the tipping points with advent of smartphones, map apps, local scouts, augmented reality directions, and multi-dimensional modeling integrating GIS and CAD with web, then you can see the data science maturity stage we are in that has the largest impact right now is – Geospatial.

Geospatial data is different. Prior to geospatial, data is non-dimension-based. It has many attributable and categorical facets, but prior to spatial data, that data does not have to be stored as a mathematical or picture form with specific relation to earth position. Spatial data – GIS, CAD, Lat/Longs, have to be stored in numerical fashion in order to calculate upon it. Further more, it hasn’t be be related to a grounding point. Essentially, geospatial is storing vector maps or pixel maps. When you begin to put that together for 10s of millions of streams, you get a a very large complicated spatially referenced hydrography dataset. It gets even more complicated when you overlay 15-minute time-based data such as water attributes (flow, height, temperature, quality, changes, etc.) with that. Even more complicated when you combine that data with other dimensions such as earth elevations and need to relate across domains of science, speaking different languages to be able to calculate how fast water may flow a certain containment down a slope after a river bank or levy collapses.

Before we can get to those more complex scenarios, geospatial data is the next progression in data complexity .