Data Patterns And Changes - New And Same Old - Large Data Program Consulting

End Users are demanding more Knowledge from Data Suppliers and their own systems, Data Suppliers must not only increase the quantity of their Data, but also the quality. In other words, their product needs to constantly evolve in order to stay relevant. Recently at the International Map Industry Association’s (IMIA) annual East Coast Meet-up in Washington DC, our Chief Architect, Matt Tricomi presented on the “Top 5 Challenges and Top 9 Tips to becoming Data-Driven Organizations”. The presentation hits on the issues plaguing organizations like those that attend IMIA. The IMIA is a global organization that represents the world of maps, and brings together leaders from across the spectrum of mapping and location-oriented businesses to connect, share and learn. Check out the key points below of the challenges and solutions hit and see what may apply for your data organization.

Top 5 Challenges to Data Transformation

1. Changing Thinking On Getting Knowledge From Data

There really is so much pressure to advance technology but also not to move an organization in the wrong direction. In particular, most resonate with the accuracy of the depiction of the changing flow of enterprise data management has gone for the past twenty years. This especially relates to lowering the incredulous feeling of why ‘we’ all can’t get it right when put in context of Conway’s law.

“organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.”

— M. Conway^[2]

So What?

Think about when were started investing – pre-internet – MIS ruled the data. Thus MIS ran the data lifecycle – collection to data model to end data product. When we shifted to the internet, which opened to untrained users, we didn’t change this model. We put HTML on our MIS systems. We’re still silo’d, yet offering such in an interconnected world to users who aren’t and don’t care to be trained in your presentation of the data. They want to ask your data what they want to know. Yet, they are forced to ask via your model, chosen fidelity, and chosen pre-constructed relationships. It’s akin to using the old TV Guide model to access Netflix – time of when Matlock is on no longer matters, so why organize it that way.

This needs to change. This is our KID concept. Data Suppliers and End Users struggle with many of the same issues they did over 20 years ago. What is holding back the evolution of data mining is the older operating model of Management Information Systems (MIS), which limits data access and analysis while focusing on a few objectives. What Xentity is suggesting is that Data Suppliers and Users should adopt the Xentity Knowledge-Information-Data (KID) Operating Model, which provides for more open access of data, across multiple platforms, and provides support for Knowledge and Information queries of data.

2. The Changing World Impacts Our Requirements

Simply put – IT is agile because the world factor – external and in enterprise – are changing. Its a complex world now vs. the complicated days where these factors were previously less fast-paced and changing less (or at least shielded).

External Threats	Enterprise Weaknesses
World Population 2x in a generation	Resistance to Change Planning
Looming U.S. brain drain	Paving Cow Paths
Legacy Physical Resources reaching limits	Geek Speak
Sensationalizing without context to masses	Poor Modernization Blueprints/Plans
Rapidly Changing consumption patterns	Islands of Automation
Faster, Global Transactions	Redundant Buying
Moore’s Law too fast?	Project Accountability & Delivery
New Technologies, New Markets – fast	Bad or irrelevant data
Governing Law is too complicated, resource-consuming	Poor Security
Shorter Term Policy, Misaligned Investment	Non-Compliant Contractors andToo much change

3. Consistent Theme: Increase Information Density As Data Volume Increases

This picture shows data volume growing linearly, yet in reality, with the explosion of IoT, Big combined with more geospatial and even more open data – its growth trends are exponential. Reducing this amazing data is the key challenge to avoid ‘overload’. Data capture can be a widely dispersed activity.

This is especially true now that data has been expanded to great volumes in the past several years, leading to Big Data challenges. The issue here is that there is so much to take in. What’s more, the volume of data continues to grow in the data centers where data is being held. It makes the process seem overwhelming. Thankfully, it is practically a requirement for new technology to be able to hold vast amounts of data in this data-heavy era. This is the only way for many entities and their products to remain relevant in a flatter world of business. Data has simply become too important for businesses not to take it seriously.

4. Consistent Data Asks

We’ve found the 5 basic needs end users epic data asks have remained constant over the last 20 years:

Discovery – “I know the information exists, but I can’t find it or access it conveniently.”
Usability – “If I can find it, can I trust it?”
Collaboration – “I don’t know who else I could be working with or who has the same needs.”
Budget Limitations – “I have no way to share costs across the Organization.”
Capability – “The existing organizational data and geospatial service capabilities do not meet my needs.”

Usually, in some semblance of this priority as well. Where can I find it? Is it Good? Who can I work with? How much? Does it match my needs? This is consistent – yet we miss the mark – see Conway’s Law again.

5. So, “Why Hasn’t This Problem Been Solved Before?”

Conway’s Law aside, After a generation or two, of facing these basic needs, this question is asked: “Why hasn’t this problem been solved before?” Why haven’t any of these issues been solved before, in fact?

And of course, we think we have a key – not the key, but a key. Since 2001, we’ve seen since the transition to the internet our clients remain addressing data issues through the Management Information System (MIS) Operating Model rather than our preferred model, Knowledge-Information-Data (KID) – which prepares data in an environment where we do not know what or how the user will ask what they want to know yet. Meaning:

Data Lifecycle methods are system oriented vs. Enterprise
Geospatial data has been characterized as “different”
Cross-cutting nature stymies enterprise adoption
Data Acquisition / Supply concepts are segmented by policy programs and organizational models
Geospatial is often orphaned; languishing somewhere between business (Part of data) and IT (GIS)
Service Oriented Architectures have not been mature enough
Computing platforms are not open or scalable enough
Operational Expense outpacing Capital Investment
Standards were not mature enough; localized adoption
Demand from consumers was not there
Business benefit not clearly understood by executives

9 Tactical Tips To Moving To Data-Driven Organization

#1 – Start And Accept The Data Value Chain Has Two Families…Communities

Consider data as two sides, groups – communities with different drives. “Suppliers of Data” and “Users of Data” (below). This approach helps in identifying issues related to data access, quality, completeness, and timeliness.

Figure X – Communities of Suppliers and Users

Communities of Suppliers are not self-organizing, and the trend is towards data “pull” models:

Data Suppliers and End Users need to identify and develop targeted communications.
Requires investment to create buy-in among Data Suppliers and End Users.
End Users need supplemental ways to harvest data in order to get the content they require.
Data Suppliers and their activities are actually where the bulk of the work resides.

Communities of Users are self-organizing and the trend is towards “push” models:

End Users use their own systems to access data where they do their work, and are comfortable using their own data mining and management tools.
Provides “Search Enablement” and the ability to catalog spatially-referenced, federated data providing easy access to data downloads, map services, and notifications.
Data Users search for thematic content first which is supported by location, time, authority, quality, and finally its data structure and delivery packaging.
End Users perceive this to be where the bulk of the work resides.

What this does is separates the problem in two parts – and working with communities aligned to their focus – and their cultural need. Stop organizing the users so much – give them what they ask in a flexible way, they’ll figure it out. Start coordinating, governing, facilitating, prototyping with, and piloting…. organizing… the suppliers to avoid the silos and allow for data to be delivered in a way to enable ease of access and use. Their mission can stop there.

#2 Accept That There Are 3 Groups Of End Data Solutions

The MIS solution – the system that works for your organization – is not longer the end use for your data. Of course that is what the funding is directed for. That is the purpose of acquiring it, producing it, generating products, and the results services, systems and apps.

Yet consider that there are a few progressions of the web. We will jump into this more in another blog, yet in short:

Some will want to connect your data to their content with increased semantic understanding for knowledge and transactions (Commerce , upper left).
Some will want to simply expose your data to their community in hopes of connecting people up to be more aware, familiar and consider your data for use. (Community, lower right)
Some will want to take your data and their separate data to the next level and make new data solutions and use to collaborate in and ‘build on the shoulder of giants (collaboration, upper right)
Finally, some may have not purpose, and stumble upon – yet only if you are discoverable and accessible in a form they do not know yet how they may use, access, or ask of your data (content, lower left)

That said, consider your data to address the personas of your priority and be prepared to answer the 5 asks earlier noted. Let’s drill down into that.

#3 End User Asks of Data Discovery – “I Know The Information Exists, But I Can’t Find It Or Access It Conveniently.”

Apps, Portals, and “in the wild” searches have the Discovery capability, yet they are only as good as the metadata that is there. Metadata via traditional library science models are still struggling. Policy and Standards only gets so far and the burden soon becomes insurmountable.

Given this, New Classification tools can help the stymied state of metadata. With limited workforce resources – if you can’t bring in interns, students, etc. – consider machine learning or semantic publishing to schema.org to get a small chomp at it. It’ll likely hit 80% of your metadata needs (following Pareto’s concept), yet the final 20% will take 80% of the effort.

If you focus on discovery metadata, you can take that first 80% win and move forward, worrying about the scientific and technical detail latter. This will be heresy to many, yet hits our ‘less sooner’ principle is more valuable. Keep in mind, most Open Data Operating Models are at Level 1 – Catalogs, so the bar you are picturing is likely higher than it really is. Yes, in long run, users want the discovery to stream experience like Voice Searching on a Roku which connects the experience of a federated search, talking to a remote, bringing up many services, and in one click, they are streaming in 5 seconds to one of the services for that movie.

#4 End User Asks Of Data Usability – “If I Can Find It, Can I Trust It?”

Building on the previous, discovery is key for this as well. The focus here is focusing on the supplier noted in the value chain. We need to shift away from legacy data planning requirements that are siloed and addressing the known program specific requirement only. The designs also need to go beyond compliance MIS processing, Records, and Documents. The Geospatial data needs to consider beyond their own data and linking to Geographic Foundation or Thematic Datasets (National, Regional, Local). As well, the end use model – be it Scientific, Technological, Engineering, or Math – needs to consider that the data be published in more than their organizations model. Yes, comes back to Conway’s Law again.

Now the new Geospatial Data Act over the next few years should help with the Geospatial data – likely focused on elevation and imagery. Policies such as GDA and new Open Data Policies are now driving to support the future unknown questions. Consequently, data acquisition needs to build in future requirements. Yet this will take some time.

Also, consider that your new investments in sensor or ongoing collection for such is now being sought in Open Data Environments. Aside from PII scraping, consider opening feeds direct to public or subscription based offering by-passing your MIS. Consequently, this opens up the QA’d data direct to user as they discover their specific knowledge they seek. New policies are pushing this way.

#5 End User Asks Of Data Supplier And User Collaboration: “I don’t Know Who Else I Could Be Working With Or Who Has The Same Needs.”

Simply put, move your development environments not only to the cloud available to your enterprise, but to your collaborative community beyond your enterprise. Get into Collaborative Workbenches. You can use to train on your data use. Setup containers with sensors, feeds, common tools, and data slices for quick access, play, train, discovery, and plugfests. This allows exposure to notebooks in shared environments for your data. And you only need to seek to put data slices or development level service level access in cloud workbenches to try out if the data quality, provenance, etc. is ready for what scientists, modelers, analyst, and data scientists may ask of it.

Again, the new Data Acquisition Policies (e.g. GDA) goals are to force the issue of marketplace-based validation prior to procurement – so jumping ahead and having workbenches could be a new way of RFI’ing and seeing vendors data and tools in action in your environment – versus some prettied up, fake data staged demo.

#6 End User Asks Of Sharing Costs – “I Have No Way To Share Costs Across The Organization.”

Biggest way to make room is to keep focusing on reducing the technical debt. IT Service vendors have truly made it over the crest of early 2000 fails in application service provider models. The cloud model is truly progressing to reduce costs with maximizing to true utility cost models with virtualization, containerization, federation, rapid increases in computing, and infrastructure sharing – while finally catching up to Federal Security requirements. Keep working on infrastructure procurement – Cloud is hard, yet worth it. Watch out for the FEDRamp and do not be sticker shocked into the manager service provider pricing models. It is worth it to move up the value-chain to focus on your data, your mission, and your community needs.

Also, work on your leaders trained on Cloud BEYOND IaaS. The cloud is so much more than cheaper and faster computing, storage, and passing IT Compliance buck off. There are so many amazing platforms for ‘common’ data services, Data pipeline, document OCR, and more. The data science tools can be on workbenches to get your workforce trained, playing, plugging and checking out the AI, ML, DL tools. And, it is so easy now to copy that one data scientists ‘desktop’ with all his/her installs, favorites, data setup, and say “here” (via a container) and even use/federate across different clouds. Keep pushing your IT – you will get to these tools.

#7 End User Asks of Data System Capabilities – “The Existing Organizational Data And Geospatial Service Capabilities Do Not Meet My Needs.”

Continuing to of course build on the previous Discovery + Usable + Collaboration + Budget questions and ideas, to truly get to exposing your data outside of your organization’s information flow, consider a new data pipelining flow. Also, design Data Stores to support questions not yet known. Meaning, do not force your potential non-enterprise users into how you molded your data into your information model or warehouse construct. See if you can provide native access to help keep the original fidelity – who knows what questions will come. Finally, prepare to move from ETL to ELT, warehouse to lake (Native form store), different data pipeline tools.

When exposing the data, some may be too big, too attribute unwieldy. Consider exposing data as microservices and iPaaS platforms. Register these to allow for discovery catalogs, engines, and then MIS/Apps. Imagine one-click ‘open-in’ access of these services in popular platforms and tools.

Also, almost as much as a savings as IT infrastructure is IT platforms. So many are customized or duct taped. Also, attempt to move to PaaS configure environments for Web Apps, Mobile Apps, and Voice Skills. With that, after you can see beyond procurement hurdles, have vision to start at sharing enterprise infrastructure means Sharing workbenches, Sharing supply, and Sharing services and workflows. Furthermore, get out of custom development – no your developers of yesterday will not like this. If they cannot shift to data applications, visualization, scientists, you may consider that they are weighing you down – the culture of custom Government frameworks where COTS can prevail needs to stop. Don’t let ‘geekspeak’ fool you.

Yet, if must remain in Custom code, refactor to support Open Source or sharing modules – let the developer community beyond your enterprise nurture and care such that it does not become your technical debt and burden.

#8 – Release Less Sooner – Minimum Viable Product

The only way to see if we are addressing the asks is release features, not full systems, in alpha’s and beta’s sooner. This was a quick summary of the concept of the “Minimum Viable Product (MVP)” from our blog on “Less Sooner MVP”, which aims to allow the Data Supplier to obtain the quickest reaction possible from your End User.

It could be a simulated data concept model, a new template not in final code, a design in mockup form, or a new configuration on cloud sandbox. Regardless, Data Suppliers should work with End Users to provide data solutions “less sooner” while providing valued, positive experiences on a consistent basis. MVP is also a far more efficient way to develop a client solution. Imagine not using the MVP method, and you do all that work. Then you discover it is incorrect or not what your End User even wanted. Consequently, you have wasted time and resources that you could have saved. However, had you simply used the MVP method for constant checkpoints, you would have most likely remained on the correct path.

For example, an End User’s desired data product may be a Cadillac. However, your first phase needs to deliver a skateboard architecture to get them moving immediately (See figure X below). This is the heart of MVP. That the skateboard can then be upgraded to a scooter, then a bicycle, and then a motorcycle. Consequently, the client moves faster and faster every time. By giving them incremental improvements in functionality, positive experiences they value, you can finally move on to the Cadillac architecture.

https://pbs.twimg.com/media/DU4LBY-W0AA7eIb.jpg

Figure X – Minimum Viable Product (MVP)

#9 – Experience is beyond Functional – It Is Usable, Reliable, And Enjoyable

While last, not least – please, please, please avoid focusing on the functionality. That is MIS thinking. Functionality can be basics, and let the end user determine what they need to know. Furthermore, put the data in a platform such that its easy to access, usable to discover, validate trust, use affordably, less clicks, reliable, and is enjoyable.

Image result for function reliable usable enjoyable

In the end, uers will choose less features that are available sooner that are reliable, usable, and easy/enjoyable. They will not wait longer for more functionality that is clumsy, broken, and creates heartburn. That is why so many, including us, have fully embraced the MVP concept. So, ‘burn’ this picture in your head. In conclusion, stop the Red (left), and move to the think blue (right).