Creating Predictability In The Government’s Geospatial Data Supply Chain…

This article expands upon the presentation on What does geodata.gov mean to data.gov presented at the First International Open Government Data Conference in November 2010 and as well the GAO releases report on FGDC Role and Geospatial Information which emphasizes similar focus on getting the data right.

Would It Be Valuable To Establish Predictability In The Government Geospatial Data Supply Chain?

What if one could be guaranteed that every year or two the United States Census Bureau produced, in cooperation with state, local authorities, and Health & Human Services (HHS), a high quality updated county boundary dataset?  And this data would contain a geocoded attributed list of all the hospitals in the country validated by the health care providers? Of course it would be valuable!  And it could provide the means to minimize redundant data purchasing, collection and processing for all agencies concerned.

If The Answer Is “Of Course”, Then Why Haven’t We Done So Already?

It is a simple concept, but one that lacks an implementation strategy. Twenty years after the establishment of Circular A-16 and FGDC metadata content standards, we are still looking at metadata from a dataset centric point of view.  That is, for “what has been” and not for “what will be”. Knowing what is coming and when it is coming enables one to plan ahead and prepare for change.

How Data Predictability and Crowdsourcing Fit Into the Picture

A “what will be” perspective is another option the model can shift into. But, only if we adopt a system’s driven data lifecycle perspective. Which would mean we look at Data Predictability and Crowdsourcing.

It may seem ironic that in the age of crowdsourcing we argue for predictable data lifecycle releases of pedigreed information. In doing so, we seemingly deny the power of the crowd. But the fact remains, civilian government entities in the US systematically collect and produce untold volumes of geospatial information. These volumes include raster, vector, and  geocodeable attributes. And they do this through many systems. Such systems include earth observation systems, mission programs using human capital, business IT systems, regulatory mandates, funding processes and cooperative agreements between multiple agencies and all levels of government. US government agencies are enormous geospatial data aggregators. Owners and operators, however, accomplish much of this work in systems viewed as special, not “spatial”.

The Artificial Boundary

All this creates an artificial boundary or perception. Consequently, geospatial data is different than other types of data. By extension, so are the supporting systems.  There remain challenges with data resolution, geometry types and attribution etc., but more importantly there is a management challenge here. All of these data aggregation systems have or could have a predictable data life-cycle. A life-cycle accompanied by publishing schedules and processing authority metadata. Subsequently, the crowd and geospatial communities could use its digital muscle to complement these systems resources if that is their desire. Also, all government programs would be informed by having predictable data resources.

What Is Required? Communicating The System’s Outputs, Owner And Timetables

Once we establish data baseline needs and availability , the geospatial users and crowd could determine the most valuable content gaps and use their resources more effectively. In essence, this creates an expanded and informed community. To date, looking for geospatial information is more akin to an archaeological discovery process. Not necessarily searching for a book at the library.

What To Do Now?

We do not wish to downplay the significance of the geospatial and subject matter experts publishing value added datasets and metadata into clearinghouses and catalogs. However, we would stand to gain much more by determining which number of systems aggregate and produce the geospatial data. Also, in creating a predictable data publishing calendar.

In the current environment of limited resources, Xentity seeks to support efforts such for the FGDC, data.gov, and other National Geospatial Data Assets and Office of Management and Budget (OMB). Doing so helps shift the focus on these primary sources of information, that enable the community of use and organize the community of supply. This model would include publishing milestones from both past and future efforts. In turn those milestones could be used to evaluate mission and geospatial end user requirements, allow for crowdsourcing to contribute and simplify searching for quality data.