Can a predictable supply chain for geospatial data be done

Creating Predictability in the Government’s Geospatial Data Supply Chain…

This article expands upon the presentation on What does geodata.gov mean to data.gov presented at the First International Open Government Data Conference in November 2010 and as well the GAO releases report on FGDC Role and Geospatial Information which emphasizes similar focus on getting the data right.

Would it be valuable to establish predictability in the government geospatial data supply chain? 

As examples, what if one could be guaranteed that every year or two the United States Census Bureau produced in cooperation with state and local authorities or that HHS produced a high quality updated county boundary dataset would produce a geocoded attributed list of all the hospitals in the country validated by the health care providers.  Of course it would be valuable and could provide the means to minimize redundant data purchasing, collection and processing.

If the answer is "of course", then why haven’t we done so already? 

It is a simple concept, but one without an implementation strategy.  Twenty years after the establishment of Circular A-16 and FGDC metadata content standards, we are still looking at metadata from a dataset centric point of view -that is for “what has been” and not for “what will be”.  Knowing what is coming and when it is coming enables one to plan.

The model can be shifted to the “what will be” perspective, if we adopt a system’s driven data lifecycle perspective. Which would mean we look at Data Predictability and Crowdsourcing.

It may seem ironic, in the age of crowd sourcing, to argue for predictable data lifecycle releases of pedigreed information and seemingly deny the power of the crowd.  But the fact remains, the civilian government entities in the US systematically collect and produce untold volumes of geospatial information (raster, vector, geo-code able attributes) through many systems including earth observation systems, mission programs using human capital, business IT systems, regulatory mandates, funding processes and cooperative agreements between multiple agencies and all levels of government. The governments in the US are enormous geospatial data aggregators but much of this work is accomplished in systems that owners and operators view as special but not “spatial”.  

An artificial boundary or perception has been created that geospatial data is different than other types of data and by extension so are the supporting systems. 

There remain challenges with data resolution, geometry types and attribution etc., but more importantly there is a management challenge here.  All of these data aggregation systems have or could have a predictable data lifecycle accompanied by publishing schedules and processing authority metadata. Subsequently, the crowd and geospatial communities could use its digital muscle to complement these systems resources if that is their desire and all government programs would be informed by having predictable data resources.

What is required is communicating the system’s outputs, owner and timetables. 

Once a data baseline is established, the geospatial users and crowd could determine the most valuable content gaps and use their resources more effectively; in essence, creating an expanded and informed community. To date, looking for geospatial information is more akin to an archaeological discovery process than searching for a book at the library.

What to do?

Not to downplay the significance of the geospatial and subject matter experts publishing value added datasets and metadata into clearinghouses and catalogs,  but we would stand to gain much more by determining which finite number of systems aggregate and produce the geospatial data and creating a predictable publishing calendar. 

In the current environment of limited resources, Xentity seeks to support efforts such as the FGDC, data.gov, and other National Geospatial Data Assets and OMB to help shift the focus on these primary sources of information that enable the community of use and organize the community of supply. This model would include publishing milestones from both past and futures that could be used to evaluate mission and geospatial end user requirements, allow for crowd sourcing to contribute and simplify searching for quality data.