Veracity Is Still Struggling To Support Integrated Information Products – We Need To Learn From These Challenges To Move To Knowledge Products

The issue addressed by this post is that various geospatial feature classes are typically managed in silos. Furthermore, these silos can cause permanent and major data veracity issues (Known as uncertain or imprecise data as shown in the figure below).

The issues that silos cause are in projection, granularity, level of detail, provenance, intended use vs. requested use, and many more when attempting to integrate the data to answer knowledge questions. For instance, geospatial data used by inventory and asset management systems have become reference data for other major datasets. However in such a case these data will typically not have the accuracy and data quality to support the modeling, simulation, accurate mapping, and artificial intelligence rules required to provide strong confidence in its use.

Xentity has been architecting in the government space for over 45 data programs supporting multiple national, agency, state, and local geospatial programs. In doing so, Xentity has developed architecture methods, common architectural patterns, and whitepapers to address designing with “knowledge first”, and then working backwards to data to establish an accurate, usable pattern.

The following statement breaks the down the needs to start with knowledge first in a Land & Resource Management (KID Paper) business focus area.

“A best practice concerning geospatial is that in order to achieve geospatial knowledge integration the data needs to be designed with the “knowledge question” first. Then, build information management production and product generation from that initial question. Finally,  align the planning and data acquisition for common reference data and end-user thematic static or streaming data to achieve the knowledge product. This approach needs to be addressed first prior to data being considered as a reliable source for cognitive processing. Most Cognitive systems are not addressing the geospatial where dimension.”

Geospatial Data Complexity Beyond XY Point Data Creates Much Broader Challenges, But Much More Value

Most data specialists are focused on content, math, language, culture and temporal analytics. However, those that are dealing with geospatial data are taking on the “point data challenge”. This approach changes however when moving to integrate disparate feature network/line data (i.e. utility, hydrographic, roads, railroad) due to issues noted above, and the cognitive rule-base issues and possibly engine processing concepts change.

The table below presents geospatial data types and their varying levels of complexity, which is formed based on the issues and concepts we have noted. You will see reference data, and static and dynamic thematic data, and how complex the data dimensions (the four v’s), an overall value of the data and types of data (line, polygonal, etc.) are for comparison. Furthermore, the hope is that this will provide insight as to why geospatial data can be such a challenge for data specialists. What you will also see is that all three types of data provide a respectable amount of value as well within the data, which is why some data specialists actually bother.

 Geospatial Data Types
 Reference Data – Static

 

(i.e. landscape topographic)

thematic data -static

 

(i.e. specific agency mission data)

thematic data -dynamic

 

(i.e. IoT, Sensors)

Data Dimension Complexity
VolumeLowMediumHigh
VarietyLowHighMedium
VeracityMediumMediumHigh
VelocityLowLowHigh
ValueHighMediumMedium
 
Data Type Complexity
PointBasicBasicIntermediate
LineHighHighHigh
PolygonHighHighHigh
Raster/pixelIntermediateIntermediateIntermediate

So What?

Geospatial data is clearly more difficult when moving from point and line to polygonal data. Raster/Pixel data can vary in complexity from intermediate to high mostly due to variance in veracity. However, high quality raster data could be considered intermediate for feature extraction in real-time (for example, autonomous cars for moving object classification).

These complexities remain high when considering tempo-spatial analysis both in current state comparing against future modeling or leveraging the historical data which can vary. But, they usually have quality degradation which change of limit use in knowledge or cognitive or even usual information processing.

The future will be interesting, because we will see how cognitive systems incorporate these two issues of handling the veracity issues of geospatial data and the various complexity principles noted.  So, while moving to faster batch, services, and performance architectures, you will likely be needed to support all these concepts. Also, the rules-based engine will need to understand how to handle the complexities beyond xy point data first. It is also important to remember that in spite of these complexities and difficulties, the payoff in geospatial data can be incredibly rewarding, which is why Xentity is so proud to devote ourselves to its understanding.