To do BigData, address Data Quality – People and Processes – Tech Access to information

Blog post
added by
Wiki Admin

As a follow on to the “cliffhanger” on BigData is a big deal because it can help answer questions fast, there are three top limitations right now: Data Quality, People and Process, Tech Access to Information. 

Lets jump right in.

Number One and by far the biggest – Data Quality

Climate Change isn’t a myth, but it is the first science to ever be presented on a data premise. And in doing so, they prematurely presented models that didn’t take into account the driving variables. Their models have changed over and over again. Their resolution of source data has increased. Their simulations on top of simulations have proven countless theories of various models that can only be demonstrated simply by Hollywood blockblusters. Point being, we are dealing with inferior data for a world scale problem, and we jump into the political, emotional driven world with a data report? We will be the frog in slowly warming water, and we will hit that boiling point late. All because we started with a data justification approach using low quality data. Are they right the world is warming? Yes. Do they have enough data to proven the right mitigation, mediation, or policy adjustments? No, and not until either we increase the data quality or take a non-data tact.

People and processes is a generation away.

Our processes in IT have been driven by Defense and GSA business models from the fifties. Put anyone managing 0s and 1s technology in the back. They are nerds, look goofy, can’t talk, don’t understand what we actually do here and by the way, they smell funny. That has been the approach to IT since the 50s – nothing has changed with the exception that their are a few bakers dozen of the hoodie wearing, mountain dew drinking, late night owls who happen to be loaded now, and their is a pseudo culture of geek chic. We have not matured our people talent investment to balance maturity of service, data, governance, design, and product lifecycle to embrace that engine culture as core to the business. This means, more effective information sharing processes to get the right information to the right people. This also means, investing in the right skills – not just feeding doritos and free soda to hackers – to manage the information sharing and data lifecycle. I am not as worried about this one. As the baby boomer generation retires, it will leave a massive vacuum as Generation X is too small and we’ll have to groom Generation Y fast. That said, we will mess up a lot missing a lot of brain drain, but market will demand relevancy which will, albeit slowly, create this workforce model in 10-15 years.

Access to Environments 

If you asked this pre-hosting environments or pre-cloud, this would have been limited to massive corporations, defense, intel, and some of the academia co-investing with those groups. If you can manage the strain of shifting to a big data infrastructure, this barrier should be the least of your problems. If you can allow your staff to get the data they need at the speed they need so they can process in parallelization without long wait times, you are looking good. Get a credit card, or if Government, buy off a Cloud GWAC, and get your governance and policies moving, as they are likely behind and not ready. Likely they will prolong the silo’d information phenomenon. Focus on the I in IT, and let the CTO respond to the technology stack. 

Focus on data quality, have a workforce investment plan, and continue working your information access policies

The tipping point that move you into Big Data is where these combined require you to deal with the complicated enormity at speeds answering questions not just for MIS and reports, but to help answer questions. If you can focus on those things in that order (likely solving in reverse), you will be able to implement parallelization of data discovery.

This will shorten the distance from A to B and create new economies, new networks, and enable your customer or user base to do things they could not before. It is the train, plane, and automobile factor all over again.

And to throw the shameless plug in, this is what we do. This is Why we focus on spatial data science and Why is change so fundamental.

When you are ready to move into BigData, it means you are wanting to Answer new questions.

Blog post
added by
Wiki Admin

The article on Throwing a Lifeline to Scientists Drowning in Data discusses how we need to be able to “sift through the noise” to be able this faster and faster deluge of sensors and feeds. Its not amount information management models of fast, large retail or defense data. It is about finding the signals you need to know to take advantage.

In controlled environments, like retail and business, this has been done for years on end to guide business analytics and targeted micro-actions. 

For instance, gambling industries have been doing this for 15 plus years taking in all the transnational data of each pull of a slot machine from all their machines from all their hotels linked with your loyalty card you entered and time of year and when you go, your profile, your trip patterns, then laws allowing, they adjust the looseness of the slots, the coupons provided, the trip rewards all to make sure they do what they are supposed to do in capitalism – be profitable. 

Even in uncontrolled environments such as intelligence, defense or internet search, the model is build analytics on analytics to improve the data quality and lifecycle so that the end analytics can improve. Its sound equalizers on top of the sound board.

Do go for the neat tech for your MIS. Go because your users are asking more of you in the data information knowledge chain. 

Continue on to read more on our follow-on article: BigData is a big deal because it can help answer questions fast

Some favorite TED talks

Blog post
edited by
Wiki Admin
– “Migrated to Confluence 5.3”

A business partner last night said “I don’t wake up and turn on my phone, or watch TV, or check email right away. I try to keep it simple… ” he said as several of us waxed rhapsodic of the pre-pocket tech and internet days and how teenagers patterns know no other world. But yet he continued, “OK, well that’s not true, I do get my morning dose of TED for inspiration”. 

Its just one more to add to the many morning intake mediums. People seeking personal philosophical guidance in the morning through religion, scripture, reading a story, meditation, prayer, mind-body engagement or quiet time. People seeking temporal context in morning news TV, newspaper, internet and feeds, websurfing (can I still use that term?), tablet time. People seeking social engagement with morning coffeee at the diner with the guys/gals, spouse or/and kid quality time, the facebook rise-and-shiner, or other social media digests. People seeking inspiration in either of the above

Personally, I have yet to ever find my morning ritual and I bounce in different mediums. Sometimes, its playing trains or toys or some activity with the family when we get a good rhythm going that morning, sometimes it is tablet browsing when feeling curious on various news or video feeds, sometimes it is mindless TV news digestion, and probably more rare than I should, sometimes it is outside quiet time in a run, bike, walk, or reading or whanot. Other times, the day gets going to fast, and there is no interstitial time, and an east coast call to this mountain time zone starts right up.

Though, I haven’t found my rhythm, but over the last partial decade here are a few of the greatest TED hits I’ve tweeted out as greatest hits and found inspirational :

Hans Rosling: Stats that reshape your world-view (Jun 2007)

Geoffrey West: The surprising math of cities and corporations (July 2011)

TEDxUofM – Jameson Toole – Big Data for Tomorrow (May 2011)

Eli Pariser: Beware online “filter bubbles” (Mar 2011)

Sugata Mitra: Build a School in the Cloud (Feb 2013)

Deb Roy: The birth of a word (Mar 2011)

-mt

Why we focus on spatial data science

Blog post
edited by
Matt Tricomi

The I in Information Technology is so broad – why is our first integrated data science problem focus on spatial data? It doesnt fit when looking on face of our Services Catalog . We get asked this a lot and this is our reason, and like Geospatial, its multi-dimensional spanning different ways of thinking, audiences, maturity, progressions, science, modeling, and time:

 

In green, x-axis, is the time progression of public web content. The summary point is data took the longest period – about 10-15 years. And data can only get better as it matures into being popular 25 years old on the web. We are in the information period now, but moving swiftly into the knowledge period. Just see how much more scientific data visualizations, and dependence we are on the internet. Just think how much you were on the web in 1998 compared to 15 years later – IT IS IN YOUR POCKET now. 

This isn’t just our theory.

RadarNetworks put together the visual of progressing through the web eras. Web 1.0 was websites or Content and early Commerce sites. Web 2.0 raised the web community with blogs and the web began to link collaboratively built information with wikis. Web 3.0 is ushering in the semantic direction and building integrated knowledge.

Even scarier, Public Web Content progression lags several business domains, but not necessarily in this leading order: Intelligence, Financial, Energy, Retail, and Large Corporate Analytics. Meaning, this curve reflects the Public maturity, and those other domains have different and faster curves. 

The recent discussions on intelligence analysis linking social/internet data with profile, Facebook/Google Privacy and use for personalized advertising, level of detail SalesForce knows about you and why companies pay so much for a license/seat, how energy exploration is optimizing where to drill in some harder to find areas, or the absolute complexity and risk of the financial derivatives as the world market goes – these technologies usually lag in how we integrate public content for googling someone, or using the internet to learn more and faster. Reason: Those do not make money. Same reason why the DoD invented the internet – it was driven by security of the U.S. which makes money which makes power. 

So, that digression aside (as we have been told “well, my industry is different”), the public progression does follow a parabolic curve that matches Moore’s Law driving factor in IT capability – every 2 years, computing power doubles in power, at same cost (paraphrasing). The fact that we can do more faster at quality levels means we can continue to increase our complexity of analysis in red. And there appears to be a stall not moving towards wisdom, but as we move toward knowledge. Its true our knowledge will continue to increase VERY fast, but what we do with that as a society is the “fear” as we move towards this singularity so fast. 

Fast is an understatement, very fast even for logarithmic progression as its hard to emote and digest the magnitude of just how fast it is moving. We moved from

  • The early 90s simply placing history up there and experimentation and having general content with loose hyperlinking and web logs
  • to the late 90s conducting eCommerce and doing math/financial interaction modeling and simulations and building product catalogs with metadata that allowed us to relate and say if a user found that quality or metdata in something, it might liek something else over here
  • to the early 2000s to engineering solutions including social and true community solutions that began to build on top of relational and the network effect and use semantics and continually share content on timelines and where a photo was taken as GPS devices began to appear in our pockets
  • To the 2010s or today where we are looking for new ways to collaborate, find new discoveries in cloud, and use the billions and billions on sensors and data streams to create more powerful more knowledgable applications

Another way to digest this progression is via the table below.

Web VersionTimeDIKWWeb MaturityKnowledge Domain Leading WebData Use Model on WebData Maturity on Web
.9early 90sDataContentHistoryExperimentalLogs
1.01995+Info HistoryExperimentalContent
1.11997  MathExperimentalRelational
1.21999 +CommerceMathHypotheticalMetadata
1.32002  EngineeringHypotheticalSpatial
2.02005+Knowledge+CommunityEngineeringComputationalTemporal
2.12010s  EngineeringComputationalSemantic
3.02015 and predictable webKnowledge+CollaborationScienceData as 4th paradigm notTempoSpatial (goes public)
4.02020 -2030Wisdom in sectorsAdvancing Collaboration with 3rd world coreAdvancing Science into Shared Services – Philsophical is out yearRobot/Ant data qualitySentiment and Predictive (goes public/useful) – Sensitive is out year

Now, think of the last teenager that could maintain eye contact in a conversation with an adult while holding phone in their hand and not be distracted by the pavlovian response of a text, tweet, instagram, etc. Now imagine, ten years from now, when its not tidbits of data, but as a call comes up, auto-searching on terms they arent aware of come up in augmented reality. Advice on how to react on the sentiment they just received – not just the information. The emotional knowledge quotient will be google now – “What do I do when?” versus critical thinking and live and learn.

So, taking it back to the “now”, though this blog is lacking the specific citations (blogs do allow us to cheat, but our research sources will make sure to detail and source our analysis), if you agree that spatial mapping for professional occurred in early 2000s and agree now that it has hit the public and understand that spatially tagging data has pass the tipping points with advent of smartphones, map apps, local scouts, augmented reality directions, and multi-dimensionl modeling integrating GIS and CAD with web, then you can see the data science maturity stage we are in that has the largest impact right now is – Geospatial.

Geospatial data is different. Prior to geospatial, data is non-dimension-based. It has many attributable and categorical facets, but prior to spatial data, that data does not have to be stored as a mathematical or picture form with specific relation to earth position. Spatial data – GIS, CAD, Lat/Longs, have to be stored in numerical fashion in order to calculate upon it. Further more, it hasnt be be related to a grounding point. Essentially, geospatial is storing vector maps or pixel maps. When you begin to put that together for 10s of millions of streams, you get a a very large complicated spatially referenced hydrography dataset. It gets even more complicated when you overlay 15-minute time-based data such as water attributes (flow, height, temperature, quality, changes, etc.) with that. Even more complicated when you combine that data with other dimensions such as earth elevations and need to relate across domains of science, speaking different languages to be able to calculate how fast water may flow a certain contaniment down a slope after a river bank or levy collapses.

Before we can get to those more complex scenarios, geospatial data is the next progression in data complexity .

That said, definitely check out our Geospatial Integrated Services and Capabilities