To do BigData, address Data Quality – People and Processes – Tech Access to information

Blog post
added by
Wiki Admin

As a follow on to the “cliffhanger” on BigData is a big deal because it can help answer questions fast, there are three top limitations right now: Data Quality, People and Process, Tech Access to Information. 

Lets jump right in.

Number One and by far the biggest – Data Quality

Climate Change isn’t a myth, but it is the first science to ever be presented on a data premise. And in doing so, they prematurely presented models that didn’t take into account the driving variables. Their models have changed over and over again. Their resolution of source data has increased. Their simulations on top of simulations have proven countless theories of various models that can only be demonstrated simply by Hollywood blockblusters. Point being, we are dealing with inferior data for a world scale problem, and we jump into the political, emotional driven world with a data report? We will be the frog in slowly warming water, and we will hit that boiling point late. All because we started with a data justification approach using low quality data. Are they right the world is warming? Yes. Do they have enough data to proven the right mitigation, mediation, or policy adjustments? No, and not until either we increase the data quality or take a non-data tact.

People and processes is a generation away.

Our processes in IT have been driven by Defense and GSA business models from the fifties. Put anyone managing 0s and 1s technology in the back. They are nerds, look goofy, can’t talk, don’t understand what we actually do here and by the way, they smell funny. That has been the approach to IT since the 50s – nothing has changed with the exception that their are a few bakers dozen of the hoodie wearing, mountain dew drinking, late night owls who happen to be loaded now, and their is a pseudo culture of geek chic. We have not matured our people talent investment to balance maturity of service, data, governance, design, and product lifecycle to embrace that engine culture as core to the business. This means, more effective information sharing processes to get the right information to the right people. This also means, investing in the right skills – not just feeding doritos and free soda to hackers – to manage the information sharing and data lifecycle. I am not as worried about this one. As the baby boomer generation retires, it will leave a massive vacuum as Generation X is too small and we’ll have to groom Generation Y fast. That said, we will mess up a lot missing a lot of brain drain, but market will demand relevancy which will, albeit slowly, create this workforce model in 10-15 years.

Access to Environments 

If you asked this pre-hosting environments or pre-cloud, this would have been limited to massive corporations, defense, intel, and some of the academia co-investing with those groups. If you can manage the strain of shifting to a big data infrastructure, this barrier should be the least of your problems. If you can allow your staff to get the data they need at the speed they need so they can process in parallelization without long wait times, you are looking good. Get a credit card, or if Government, buy off a Cloud GWAC, and get your governance and policies moving, as they are likely behind and not ready. Likely they will prolong the silo’d information phenomenon. Focus on the I in IT, and let the CTO respond to the technology stack. 

Focus on data quality, have a workforce investment plan, and continue working your information access policies

The tipping point that move you into Big Data is where these combined require you to deal with the complicated enormity at speeds answering questions not just for MIS and reports, but to help answer questions. If you can focus on those things in that order (likely solving in reverse), you will be able to implement parallelization of data discovery.

This will shorten the distance from A to B and create new economies, new networks, and enable your customer or user base to do things they could not before. It is the train, plane, and automobile factor all over again.

And to throw the shameless plug in, this is what we do. This is Why we focus on spatial data science and Why is change so fundamental.

The Four V's of BigData – Variety Veracity Velocity Volume

Blog post
edited by
Matt Tricomi

If we are simply talking about lots of retail data, lots of sales data, lots of management data, lots of metadata, we aren’t talking BigData. Though for some reason, those data are going through and into the new architectures. Yeah, sure the retail, defense, and intelligence worlds have been sifting through huge data stores for years. 

But the marketing coined term BigData is not just referring to the Volume of the data. There are 4 V’s of big data . We have been enjoying using as learned through our information exchanges and partnering with IBM.

The first two v’s focus on mission side: Variety and Veracity

Variety (how many categories of data does it cover) includes as well technology (sources, formats (

(e.g. numeric, text, objects, geocoded, vector, raster, structured, unstructured – email, video, audio, etc.), methods) and legal  (complexity, privacy, jurisdiction). Essentially working with various types of data including various dimensions (temporal, geospatial, sentiment, metadata, logs, etc.) and 

Veracity (understanding authority of data) includes known data quality, type of data, data management maturity so that you can understand how much you can trust the data is right and accurate

The other focus on the speed and amount: Velocity and Volume

Volume (how much data). – Capturing, Processing, Reporting and managing a large volume of data

Velocity (how often it changes or real-time) – Analyzing and exploiting lots and lots of new data in real-time

Changes to the V’s over the last 15 years

 

 

Veracity is the newer one. The concept of data quality has always been the orphaned step-child. It is the I in IT. Its the part of the iceberg under the water. All IT vendors want to sell speed, and handles lots of data, and some commodities of variety support, but once sold, you are on your own (or $300/hour for mission customization). But, we are happy to see IBM got there and added it.

As of late, Value has been introduced as well. Paraphrasing the spanish article, even if you can produce information, is there is no real action that can be done with it, its not Valuable to the organization.

Then again, some have accused IBM of stealing the V’s from 2001 Gartner V’s development . This V-gate controversy :

Nonetheless, if anything it proves that its a concept that is worth fighting over meaning and digging into it, you see it has its merits. 

IT Footprint Progression in the Federal Government… and the role of Architecture

Blog post
edited by
Matt Tricomi

In early December 2011, Xentity architects had a chance to discuss future progression with OMB current and former Chief Architects. The following captures some of the concepts discussed for architecture in context of the overall IT progression in Federal Government both looking back at where IT has been, but given current initiatives, but could be further emphasized, integrated, or uncovered moving forward.

PDF: ITProgression-Policy-Arch-Role.pdf

Where IT has been

Organization Information Role

IT originally was built as a Federated cost center. IT moved into Agency Data Centers model for archive/records. Commerce Models of economy drove eGov without migrating IT to Service models. Govt swamped in the T of IT, haven’t got back to the I. 

Trained Workforce

IT Acquisition originally was about financing/leasing federated shared manufactured systems Then Client/Server came, which is solution architecture, and 1000s of stacks came. Acquisition never got trained or linked with EA, and engineering mostly got outsourced

Data Center Footprint

Federated Regional Centers moved to Mission Centers moving to Server Rooms due to 1000s of system configurations. Initial FDCCI will close 1/3 of the portfolio count, but other 2/3s will have high transition costs

Technology Stack

IT stack started monolithic and slowly moved into server tiers then into service components. Though reduced individual system develop. cost, O&M Total Cost of Ownership was ignored and system security and interoperability was low.

Recommendations Discussed

Organization Information Role

– Institutionalized Funding for Enabling G2G IT Services at GSA
– Position G2C Services as High-Available Service Centers
– Focus Mission on Data Services training on Data Lifecycle that allows Private, NGO, Citizens to build on top of
– Increase EA role in support of mission and policy analysts for depth reach back to increase new political appointee effectiveness and limit turnover/ takeover disruption

Trained Workforce

– Position EA as depth knowledge base for acquisition guidance reachback
– Train all 3 workforce components on Performance maturity, Common Stack Architecture, and FEA/FSAM v2
– Establish CIO and EA relationship formerly with Acquisition
– Retire ineffective, existing burden to demonstrate credibility, eating own dog food, and increase success chances

Data Center Footprint

– Increase CMMI/ITIL Requirements to lower O&M
– Continue FDCCI
– Guide 2nd Consolidation Phase through taking advantage of CPIC renewal or new system cycles to assure re-use of new platform or data services in new shared platform environments to avoid simple hardware/system “cowpath” migration

Technology Stack

– Establish a Common Stack Architecture to be cloud managed service platforms by Select Large Agencies, waived Large Programs, and GSA
– Manage a True Common Stack Portfolio definition and implementation status
– Target New Solutions and Existing Systems for collaborative evaluation