BigData is a big deal because it can help answer questions fast

Blog post
added by
Wiki Admin

BigData is not just size and speed of complex data – it is moving us from information to knowledge

 

As our Why we focus on spatial data science article discusses, the progress of knowledge fields – history to math to engineering to science and to philosophy – or the individual pursuit of knowledge is based on moving from experiments to hypotheses to computation to now The Fourth Paradigm: Data-Intensive Scientific Discovery. This progression has happened over the course of human history and is now abstracting itself on the internet.

The early 90s web was about content, history, and experiments. The late 90s web was about transactions, security and eCommerce. The 2000s web was about engineering entities breaking silos – within companies, organizations, sectors, and communities. The 2010s web has been about increasing collaborating of communication, work production, and entering into knowledge collaboration. The internet progression is just emulating human history capability development.

When you are ready to move into BigData, it means you are wanting to Answer new questions.

That said, The BigData phenomenom is not about the input of all the raw data and explosion that the Internet of Things is being touted as. The resource sells, and the end product is the consumed byproduct. So lets focus on that by-product – its knowledge. Its not the speed of massive amounts of new complex and various quality data as our discussion on IBM’s 4 V’s focus on.

Its about what we can do with the technology on the cheap that before required supercomputer clusters that only the big boys had. Now with cloud, internet, and enough standards, if we have good and improving data, we ALL now have the environment to be answering complicated questions while sifting through the noise. Its about the enablement of the initial phase of knowledge discovery that everyone is complaining about the “web” right now “too much information” or “drowning in data”.

The article on Throwing a Lifeline to Scientists Drowning in Data discusses how we need to be able to “sift through the noise” and make search faster. That is the roadblock, the tall pole in the tent, the showstopper.

Parallelizing the search is the killer app – this is the Big Deal, we should call it BigSearch

If you have to search billions of records and map them to another billion records, doing that in sequence is the problem. You need to shorten the time it takes to sift through the noise. That is why Google became an amazing success out of nowhere. They did and are currently doing it better than anyone else – sifting through the noise.

The United States amazing growth is because of two things – we have resources and we found out how to get to them faster. Each growth phase of the United states was based on that fact alone, and a bit of stopping the barbarians at the gates our ourselves from implosion. You could say civilization. Some softball examples out of hundreds

  • Expanding West dramatically exploded after trains, which allowed for regional foraging and mining
  • Manufacturing dramatically exploded production output, which allowed for city growth
  • Engines shortened time between towns and cities, which allowed for job explosion
  • Highway systems shortened time between large cities, which allowed for regional economies
  • Airplanes shorten time between the legacy railroad time zones, which allowed for national economies
  • Internet shortened access to national resources internationally, which allowed for international economies
  • Computing shortened processing time of information, which allows for micro-targetted economies worldwide

Each “age” resulted in shortening the distance from A to B.  But, Google is sifting through data. Scientists are trying to sift as well through defined data sensors, link them together and ask very targetted simulated or modeled questions. We need to address the barriers limiting entities success to do this. 

 

GAO releases report on FGDC Role and Geospatial Information

Blog post
edited by
Wiki Admin

GAO release report on us of geospatial information with title “OMB and Agencies Can Reduce Duplication by Making Coordination a Priority”. Readers Digest – focus on integrating data. 

Click to download PDF

We tend to agree. FGDC is currently very focused on a service enabling management model (Geoplatform) to accomplish this. It is bold, but if their role of being a service provisioner can directly or indirectly get them in the game to address the real problem of data lifecycle management, they will have a chance to address this. 

Point being, FGDC knows its role is not to be in IT Operations as its direct goal. But, they also saw that being a sideline judge with no carrot or stick role would not garner the direction and recommendations that GAO suggests. They are getting on the playing field, taking advantage of the open service provider role, being that broker, and using that role to move IT costs down, and also enabling those shifts in monies to then focus on the data issues cited. Its bold, and a unique approach, and there are many questions can a traditionally non-operational group develop that culture to be effective. Proof will show over the next 2 years.

Below find our summary of strategic direction for FGDC’s geoplatform.

The challenges and recommendation sections are:

  1. FGDC Had Not Made Fully Implementing Key Activities for Coordinating Geospatial Data a Priority
  2. Departments Had Not Fully Implemented Important Activities for Coordinating and Managing Geospatial Data
  3. Theme-lead Agencies Had Not Fully Implemented Important Activities for Coordinating and Managing Geospatial Data
  4. OMB Did Not Have Complete and Reliable Information to Identify Duplicative Geospatial Investments

Our review of Background – then and now

The foundation the FGDC has put in place. The Federal Geographic Data Committee (FGDC) has always been a catalyst and leader enabling the adoption and use of geospatial information.

The Federal Geographic Data Committee (FGDC) has been successfully creating the geospatial building blocks for the National Spatial Data Infrastructure (NSDI) and empowering users to exploit the value of geospatial information.  The FGDC has been leading the development of the NSDI by creating the standards and tools to organize the asset inventory, enhance data and system interoperability and increase the use of national geospatial assets. The FGDC has successfully created policy, metadata, data and lifecycle standards, clearinghouses, catalogs, segment architectures and platforms that broaden the types and number of geospatial users while increasing the reuse of geospatial assets. [1] 

What is next? The Geospatial Platform and NGDA portfolio will be the mechanism for adoption of shared geospatial services to create customer value

Recently, the FGDC and its’ partners, have expanded their vision to include the management and development of a shared services platform and a National Geospatial Data Asset (NGDA)portfolio.  The goals are to “develop National Shared Services Capabilities, Ensure Accountability and Effective Development and Management of Federal Geospatial Resources, and Convene Leadership of the National Geospatial Community benefitting the communities of interest with cost savings, improved process and decision making”.[2]

As the FGDC continues on the road to establish a world class geospatial data, application and service infrastructure, it will face significant challenges “where the Managing Partner, along with a growing partner network, will move from start‐up and proof‐of‐concept to an operational Geospatial Platform”.[3]

Xentity has reviewed the FGDC’s current strategy, business plan and policies and identified the following critical issues that need to be solved to attain the goals:

  • Building and maintaining a federated, “tagged”[4] standards-based NGDA and an open interoperable Geospatial platform. The assets need to provide sufficient data quantity and quality with service performance to attract and sustain partner and customer engagement[5]
  • Developing a customer base with enough critical mass to justify the FGDC portfolio and provide an “Increased return on existing geospatial investments by promoting the reuse of data application, web sites, and tools, executed through the Geospatial Platform” [6]
  • Improving Service Management and customer-partner relationship capabilities to accelerate the  adoption of interoperable “shared services” vision and satisfy customers [7]
  • Executing simple, transparent and responsive Task Order and Requirements management processes that result in standards based interoperable solutions.  [8]

The Big Challenges

Establish the financial value and business impact of the FGDC’s Portfolio!

The Geospatial Platform and NGDA will provide valuable cost saving opportunities for its adopters.  It will save employee’s time; avoid redundant data acquisition and management costs, and improve decision making and business processes.  The financial impact to government and commercial communities could be staggering. It is a big and unknown figure.

The Geospatial Platform by definition and design is a powerful efficient technology with the capacity to generate a significant return on investment.  It is a community investment and requires community participation to realize the return.  The solution will need to assist the communities with the creation and sharing of return on investment information, cost modeling, case studies, funding strategies, tools, references and continue to build the investment justification.  The solution will need to optimize funding enhancement and be responsive to shorter term “spot” or within current budget opportunities while always positioning for long term sustainability.  The FGDC Geospatial Platform Strategic Plan suggests a truly efficient capability could create powerful streamlined channels between much broader stakeholder communities including citizens, private sector, or other government-to-government interfaces. Similar to the market and business impacts of GPS, DOQ, satellite imaging technology, the platform could in turn promote more citizen satisfaction, private sector growth, or multiplier effects on engaged lines of business.

To get a big return, it will demand continuous creative thinking to develop investment, funding, management and communication approaches to realize and calculate the value.  It is a complex national challenge involving many organizations, geospatial policy, conflicting requirements, interests and intended uses.

The key is demonstrable successes.  Successes become the premise for investment strategy and cost savings for the customers.  Offering “a suite of well‐managed, highly available, and trusted geospatial data, services, and application, web site for use by Federal agencies—and their State, local, Tribal, and regional partners” [9] is the means to create the big value.  

”A successful model of enterprise service delivery will create an even greater business demand for these assets while reducing their incremental service delivery costs.” [10]

FGDC has to create and tell a compelling “geospatial” value proposition story

To successfully implement the FGDC’s vision, it will demand a robust set of outreach and marketing capabilities.  The solution will need to help construct the platforms value proposition and marketing story to build and inform the community.  The objective is to ensure longer term sustainable funding and community participation.  The solution will need to bring geospatial community awareness, incentive modeling, financial evaluation tools, multi-channel communication and funding development experience to the FGDC.  The solution will need to have transparently developed and implemented communication and marketing strategies that have led to growth in customer base, alternative portfolio funding models and shared services environments for the geospatial communities.  The solution will need to have an approach that will be transparent, engage the customer and partners and continuously build the community.

This is a challenging time to obtain needed capital and win customers even for efficient economic engines like shared geospatial data and services.  The solution will need to approach the community outreach is impactful, trusted and will tell the story of efficiencies, cost savings, and higher quality information.  The platform and NGDA must impact the customer program objectives. Figure 1 – FGDC Performance and Value framework shows how the platform’s value chain aligns with the types of performance benefits that can be realized throughout its inherent processes. The supporting team’s understanding of this model will need to organize the “Story” to convince the customer and partners that the platform can:

  • Provide decision makers with content that they can use with confidence to support daily functions and important issues,
  • Provide consistency of base maps and services that can be used by multiple organizations to address complex issues,
  • Eliminate the need to choose from redundant geospatial resources by providing access to preferred data, maps and services[11] 

As the approach is implemented, the FGDC, its partners and the Communities of Interest will have successfully accelerated the adoption and use of location based information.  Uses will recognize the value offering and reap the benefits to their operations and bottom line.   The benefits will be measurable and support the following FGDC business case objectives:

  • Increasing Return on Existing Investments, Government Efficiency, Service Delivery
  • Reducing (unintentional) Redundancy and Development and Management Costs
  • Increasing Quality and Usability[12]

Our Suggested Solution

FGDC’s challenges requires PMO, integrated lifecycle management, partner focus, and blend experience with an integrated approach and single voice designed to meet the FGDC’s strategic objectives and provide a world-class service shared services and data portfolio.  Doing this, they can integrate organizations, data, and service provision.

A solution like this would provide the program, partner and customer relationship management, communications, development and operational capabilities required to successfully implement the FGDC’s vision and business plan. The focus will need to 

  1. Coordinate cross-agency tasks, portfolio needs in agile prgoram management coordination with a single voice,
  2. implement an understanding of critical lifecycle processes to manage and operate the data, technology, capital assets and development projects for a secure cloud-based platform
  3. have communications and outreach focused on communities for partner and customer engagement in the lifecycle decisions
  4. Finally, make sure secretariat staff and team has rotating collective experience with representatives and contractors who hav esuccessfully performed at this scale across all functional areas with domain knowledge in Geospatial, technology, program, service, development and operations.

The strategy and collective experience and techniques will enable FGDC to provide a single voice from all management domains (PMO, Development, Operations and Service Management) for customer engagement. The approach will be need to be integrated with the existing FGDC operating model creating a sum value greater than that of its individual parts. This approach will help create the relationship to develop trusted partner relations services. 


[1]  (page 7 – Geospatial –Platform-Business-Plan-Redacted-Final)

[2]  (page 2 – Draft NSDI –Strategic Plan 2014-2016 V2)

[3]  (page 28 – Geospatial –Platform-Business-Plan-Redacted-Final)

[4]  (page 11 – Ibid )

[5]  (page 9 – Ibid)

[6]  (page 26 – Ibid)

[7]  (page 4 – Ibid)

[8]  (page 6 – Ibid)

[9]  (page 2 – Geospatial –Platform-Business-Plan-Redacted-Final)

[10]  (DOI Geospatial Services Blueprint – 2007)

[11]  (page 13 – Geospatial –Platform-Business-Plan-Redacted-Final)

[12]  (Appendix A – Geospatial –Platform-Business-Plan-Redacted-Final)

[13]  (Page 12 – OMB Circular A-16 Supplemental Guidance)

[14]  (page 12 – Geospatial –Platform-Business-Plan-Redacted-Final)

[15]  (page 36 – Ibid)

[16]  (ITSM – Service Operations V3.0)

[17]  (page 26 – Ibid) 


Will geoscience go for a shared service environment

Blog post
edited by
Wiki Admin

Will geoscience go for a shared service environment?

As the previous “How can we help geoscience to move their data to shared services” blog noted, unless we align the stakeholders, get a clear line of sight on their needs, and focus on earning trust and demonstrating value, the answer is no. But let’s say we are moving that way. How do we get started to fund such an approach?

Well, first off, the current grant and programmatic funding models are not designed to develop shared services or interoperable data for the geosciences.  Today, there are many geoscientists who are collaborating between disciplines and as a result improving the quality of knowledge and scaling the impact of their research.  It is also well established that the vast majority operate individually or in small teams.  Geoscientists, rightly so, continue to be very focused on targeted scientific objectives and not on enabling other scientists. It is a rare case when they have the necessary resources or skills.  With the bright shiny object of data driven science /Big data; do we have the Big Head wagging the body of the geoscientist community?  Xentity sees opportunities to develop funding strategies to execute collaborative performance-based cross discipline geosciences. It has been this way since World War II really expanded upon its war-time successful onesy-twosy grants to universities since then. There has been some movement towards hub and spoke grant funding models, but we are still out to get our PhD stripes, get our CVs bigger, and keep working with the same folks. I know it is a surly and cynical view. OK, the real is they are doing amazing work, but in their field, and anything that slows down their work, for greater good, is lacking incentive.

Also, there are few true shared services that are managed and extended to the community as products and services should be. Data driven science, which is out fourth paradigm of science, has been indirectly “demanding” scientific organizations and their systems to become 24×7 service delivery providers. We have been demanding IT programmers to become service managers, scientist to become product managers, or data managers.  With a few exceptions, it has not worked. Geoscientists are still struggling to find and use the basic data/metadata, produce quality metadata (only 60% meet quality standards per EarthCube studies) for their own purposes, let alone making the big leap to Big data and analytics. Data driven science requires not only a different business or operating model, but a much clearer definition of the program, as well as scientist’s roles and expectations.  It requires new funding strategies, incentive models and a service delivery model underpinned by the best practices of product management and service delivery.

Currently, and my favorite, there is limited to no incentive for most geoscientist to think beyond their immediate needs.  If geoscientists are to be encouraged to increase the frequency and volume of cross-discipline science, there needs to be enablement services, interoperable data and information products that solve repetitive problems and provide incentive for participation.  We need to develop the necessary incentive and management models to engage and motivate geoscientist, develop a maturity plan for the engineering of shared geoscience services and develop resourcing strategies to support its execution. Is this new funding models, new recognition models, new education, gamification, crowdsourcing, increasing competition, changing performance evaluation? Not sure as any changes to “game” rules can and usually introduces new loopholes and ways to “game” the system.

The concept of shareable geoscience data, information products and commodity or analytical computing services has an existing operating precedent in the IT domain –shared services.  Shared services could act as a major incentive for participation.  An approach would identify the most valuable cross cutting needs based on community stakeholder input. The team would use this information to develop a demand driven plan for shared service planning and investment. As an example, a service-based commodity computing platform can be developed to support both the Big Head and Long Tail and act as incentive to participation and perform highly repetitive data exchange operations.

How does one build and sustain a community as large and diverse as the geosciences? 

The ecosystem of geoscience is very complex from a geographic, discipline and skill level point of view. How does one engage so diverse a community in a sustainable manner?  “Increased visibility of stakeholder interests will accelerate stakeholder dialogue and alignment – avoiding “dead ends” and pursuing opportunities.” The stakeholders can range from youthful STEM to stern old school emeritus researchers; from high volume high frequency data producers of macro scale data to a single scientist with a geographically targeted research topic. It is estimated that between 80-85% of the science is done in small projects.  That is an enormous intellectual resource that if engaged can be made more valuable and productive.

Here is a draft target value chain :

The change or shift puts a large emphasis on upfront collaborative idea generation, team building, and knowledge sharing via syndication, and new forms of work decomposition in the context of crowd participation (Citizen Science and STEM).  The recommended change in the value chain begins to accommodate the future needs of the community.  However, the value chain becomes actionable based on the capabilities associated to the respective steps.  Xentity has taken the liberty to alliteratively define these four classes of capabilities or capability clusters as:

Encouragement, Engagement, Enablement, and Execution.

Encouragement capabilities are designed to incentivize or motivate the scientist and data suppliers to participate in the community and garner their trust. They are designed to increase collaboration, the quality and value of idea generation and will have a strong network and community building multiplier effect. 

Questions

Capabilities

  • How can new scientific initiatives be collaboratively planned for and developed?
  • How can one identify potential collaborators across disciplines?
  • How can one’s scientific accomplishments and recognition be assured and credited?
  • What are the data possibilities and how can I ensure that it will be readily available?
  • How can scientific idea generation be improved?
  • Incentives based on game theory
  • Collaboration, crowd funding, crowd sourcing and casting
  • Needs Analysis
  • Project Management and work definition
  • Credit for work Services

Engagement Capabilities include the geoscience participant outreach and communication capabilities required to build and maintain the respective communities within the geoscience areas.  These are the services that will provide the community the ability to discuss and resolve where the most valued changes will occur within the geosciences community, who else should be involved in the effort?  

Questions

Capabilities____________________

  • What participants are developing collaborative key project initiatives?
  • What ideas have been developed and vetted within the broadest set of communities?
  • Who, with similar needs, may be interested in participating in my project?
  • How can Xentity cost share?
  • Customer Relationship Management
  • Promotions
  • Needs Analysis
  • Communications and Outreach
  • Social and Professional Networking

Enablement capabilities are technical and infrastructure services designed to eliminate acquisition, data processing and computing obstacles and save scientist time and resources.  They are designed to solve frequently recurring problems that affect a wide variety and number of geoscience stakeholders from focusing on their core competencies – the creation of scientific knowledge. Enablement services will have a strong cost avoidance multiplier effect for the community on the whole if implemented and supported.

Questions

Capabilities

  • How does one solve data interoperability challenges for data formats and context?
  • How do I get data into the same geographic coordinate system or scale of information?
  • How can I capture and bundle my Meta information and scientific assets to support publication, validation and curation easily?
  • How can I get access extensible data storage throughout the project lifecycle?
  • Where and how can I develop an application with my team?
  • How can I bundle and store my project datasets and other digital assets later retrieval?
  • How can I get scalable computing resources without having to procure and manage servers to complete my project?
  • Workflow
  • Process Chaining
  • Data Interoperability
    • Data transformations
    • Semantics
    • Spatial Encoding and Transformation
    • Data Services
  • Publishing
  • Curation

Syndication

Execution Capabilities are comprised of the key management oriented disciplines that are required to support shared infrastructure, services or to help evolve a highly federated set of valuable assets “edges” to be more useable and valuable to the evolving community over time.

Questions

Capabilities____________________

  • How do we collectively determine what information might require a greater future investment?
  • What are the right incentives in the grant processes?
  • What are the future funding models?
  • What models should be invested in?
  • Which technologies should be evaluated for the shared assets?
  • What upcoming shared data or technology needs are in common to a large number of participants?
  • Governance,
  • IT Service Management (ITSM),
  • Product Management,
  • Performance Management,
  • Requirements Management,
  • Data Management,
  • Data Supply Management,
  • Data Life Cycle Management
  • Funding
  • Grants and processing

So, why did we develop these classes of capabilities? 

They represent, at the macro level, a way to organize a much larger group of business, operating and technical services that have been explicitly discussed in NSF EarthCube efforts over the last 3-4 years. We then been derived these outputs from analysis and associate them to the most important business drivers. Check out this “draft” relationship of capabilities drivers and rational 

 RationaleDrivers
EngageThe best way to create communities and identify common needs and objectives, begin to build trust and value awareness; bring the respective communities into an environment where they can build out their efforts and sustain collaborative approaches.Agency (how to navigate planned versus emergent change), intellectual property rights, infrastructure winners and losers, agreement on data storage, preservation, curation policies and procedures, incentives to share data and data sharing policies, and trust between data generators and data users.
EncourageThe best models to incentivize scientist’s and data producers to participate and collaborate. Xentity have developed game theory based approaches and large scale customer relationship management solutions Social and cultural challenges: Motivations and incentives, self-selected or closely-held leadership, levels of participation, types of organizations, and collaboration among domain and IT specialists)
EnableThe most costly data processing obstacles – The lowest common denominator – highest impact problem.  A common problem found in shared service environments. We have developed enterprise service analysis tools for cost benefit for the DOI geospatial community, so we have seen this work80% of scientist data needs can be expressed as standard data product, and 80 % of scientist time is spent getting data into proper form for research analysis
ExecuteA governance model that will increase the “edge effect” between the legacy and future capabilities and a very diverse set of communities. Simple planning capabilities that empower scientist to work complex cross disciplines ideas amongst themselves, define work and coordinate with the power of the crowd. We have designed collaborative environments and crowd based frameworks for data collection and analysis with corresponding performance management system.Conceptual and procedural challenges: Time (short-term funding decisions versus the long-term time-scale needed for infrastructures to grow); Scale (choices between worldwide interoperability and local optimization);

So why don’t we do it?

Well, this does introduce an outside approach into a closed knit geoscience community who is very used to solving for themselves. Having a facilitated method from outside consulting or even teaming with agency operations who have begun moving this route for their national geospatial data assets is not seen as something fits their culture. We are still learning of hybrid ways we can collaborate and help the geoscientists setup such a framework, but for now it is still a bit foreign of a concept, and while there is some awareness by the geoscientist community to adopt models that work for other sectors, industries, operational models, the lack of familiarity is causing a lot of hesitation – which goes back to the earn trust factor and finding ways to demonstrate value.

Til then, we will keep plugging away, connecting with the geoscience community in hopes that we can help them advance their infrastructure, data, and integration to improve earth science initiatives. Until then, we will remain one of the few top nations without an operational, enterprise national geoscience infrastructure.

Why we focus on spatial data science

Blog post
edited by
Matt Tricomi

The I in Information Technology is so broad – why is our first integrated data science problem focus on spatial data? It doesnt fit when looking on face of our Services Catalog . We get asked this a lot and this is our reason, and like Geospatial, its multi-dimensional spanning different ways of thinking, audiences, maturity, progressions, science, modeling, and time:

 

In green, x-axis, is the time progression of public web content. The summary point is data took the longest period – about 10-15 years. And data can only get better as it matures into being popular 25 years old on the web. We are in the information period now, but moving swiftly into the knowledge period. Just see how much more scientific data visualizations, and dependence we are on the internet. Just think how much you were on the web in 1998 compared to 15 years later – IT IS IN YOUR POCKET now. 

This isn’t just our theory.

RadarNetworks put together the visual of progressing through the web eras. Web 1.0 was websites or Content and early Commerce sites. Web 2.0 raised the web community with blogs and the web began to link collaboratively built information with wikis. Web 3.0 is ushering in the semantic direction and building integrated knowledge.

Even scarier, Public Web Content progression lags several business domains, but not necessarily in this leading order: Intelligence, Financial, Energy, Retail, and Large Corporate Analytics. Meaning, this curve reflects the Public maturity, and those other domains have different and faster curves. 

The recent discussions on intelligence analysis linking social/internet data with profile, Facebook/Google Privacy and use for personalized advertising, level of detail SalesForce knows about you and why companies pay so much for a license/seat, how energy exploration is optimizing where to drill in some harder to find areas, or the absolute complexity and risk of the financial derivatives as the world market goes – these technologies usually lag in how we integrate public content for googling someone, or using the internet to learn more and faster. Reason: Those do not make money. Same reason why the DoD invented the internet – it was driven by security of the U.S. which makes money which makes power. 

So, that digression aside (as we have been told “well, my industry is different”), the public progression does follow a parabolic curve that matches Moore’s Law driving factor in IT capability – every 2 years, computing power doubles in power, at same cost (paraphrasing). The fact that we can do more faster at quality levels means we can continue to increase our complexity of analysis in red. And there appears to be a stall not moving towards wisdom, but as we move toward knowledge. Its true our knowledge will continue to increase VERY fast, but what we do with that as a society is the “fear” as we move towards this singularity so fast. 

Fast is an understatement, very fast even for logarithmic progression as its hard to emote and digest the magnitude of just how fast it is moving. We moved from

  • The early 90s simply placing history up there and experimentation and having general content with loose hyperlinking and web logs
  • to the late 90s conducting eCommerce and doing math/financial interaction modeling and simulations and building product catalogs with metadata that allowed us to relate and say if a user found that quality or metdata in something, it might liek something else over here
  • to the early 2000s to engineering solutions including social and true community solutions that began to build on top of relational and the network effect and use semantics and continually share content on timelines and where a photo was taken as GPS devices began to appear in our pockets
  • To the 2010s or today where we are looking for new ways to collaborate, find new discoveries in cloud, and use the billions and billions on sensors and data streams to create more powerful more knowledgable applications

Another way to digest this progression is via the table below.

Web VersionTimeDIKWWeb MaturityKnowledge Domain Leading WebData Use Model on WebData Maturity on Web
.9early 90sDataContentHistoryExperimentalLogs
1.01995+Info HistoryExperimentalContent
1.11997  MathExperimentalRelational
1.21999 +CommerceMathHypotheticalMetadata
1.32002  EngineeringHypotheticalSpatial
2.02005+Knowledge+CommunityEngineeringComputationalTemporal
2.12010s  EngineeringComputationalSemantic
3.02015 and predictable webKnowledge+CollaborationScienceData as 4th paradigm notTempoSpatial (goes public)
4.02020 -2030Wisdom in sectorsAdvancing Collaboration with 3rd world coreAdvancing Science into Shared Services – Philsophical is out yearRobot/Ant data qualitySentiment and Predictive (goes public/useful) – Sensitive is out year

Now, think of the last teenager that could maintain eye contact in a conversation with an adult while holding phone in their hand and not be distracted by the pavlovian response of a text, tweet, instagram, etc. Now imagine, ten years from now, when its not tidbits of data, but as a call comes up, auto-searching on terms they arent aware of come up in augmented reality. Advice on how to react on the sentiment they just received – not just the information. The emotional knowledge quotient will be google now – “What do I do when?” versus critical thinking and live and learn.

So, taking it back to the “now”, though this blog is lacking the specific citations (blogs do allow us to cheat, but our research sources will make sure to detail and source our analysis), if you agree that spatial mapping for professional occurred in early 2000s and agree now that it has hit the public and understand that spatially tagging data has pass the tipping points with advent of smartphones, map apps, local scouts, augmented reality directions, and multi-dimensionl modeling integrating GIS and CAD with web, then you can see the data science maturity stage we are in that has the largest impact right now is – Geospatial.

Geospatial data is different. Prior to geospatial, data is non-dimension-based. It has many attributable and categorical facets, but prior to spatial data, that data does not have to be stored as a mathematical or picture form with specific relation to earth position. Spatial data – GIS, CAD, Lat/Longs, have to be stored in numerical fashion in order to calculate upon it. Further more, it hasnt be be related to a grounding point. Essentially, geospatial is storing vector maps or pixel maps. When you begin to put that together for 10s of millions of streams, you get a a very large complicated spatially referenced hydrography dataset. It gets even more complicated when you overlay 15-minute time-based data such as water attributes (flow, height, temperature, quality, changes, etc.) with that. Even more complicated when you combine that data with other dimensions such as earth elevations and need to relate across domains of science, speaking different languages to be able to calculate how fast water may flow a certain contaniment down a slope after a river bank or levy collapses.

Before we can get to those more complex scenarios, geospatial data is the next progression in data complexity .

That said, definitely check out our Geospatial Integrated Services and Capabilities