Using Portfolio Concepts to Deal With Changing Data Solutions

How would you like it if your tools changed every two years?

In 2018, the construction sector employed almost 11.2 million people in the United States. Imagine if the tools you use for construction changed every 2 years? The re-training, certifications, the physical risks to trade workers of adapting to new tools. The business implications on capital, job length, insurance, bonding, safety codes and enforcement. The business model implications: theoretically you only complete around 4 contracts before all your tools change. Sometimes you have more tools at your disposal. Or the tools you use become obsolete and you need new ones. When your tools change every 2 years, you have a constant period of adjustment. One where you rediscover what is new and what is now obsolete.

Of course, that is not what happens in long-term and physical industries. They change more like every 10 to 20 years.

This is the Reality of Data and Computing.

Meaning, while the technophiles vendors, love it and the growth and new development, the investors, managers, and business side generally aren’t as much of a fan. Indeed, in our data-computing world, this is our life. Data is changing roles and value in society. This also means growth. The statistics are obscene: 2-3 exabytes of data is generated a day. 90% of the world’s data is created every 2 years. The world is nearing 50 zettabytes in total (aka 50 trillion gigabytes). And there are so many more stats at the growth. You get the gist.

We’re mainly seeing this data come from Human and Machine Generated Data. Or, previous data exhaust from social signals (social media, web searches) and more dense (photos, video) and verbose workflow tracking (web server logs, system clicks and status). Soon, training data for AI, even more life intelligence statistics, transaction integrity/fraud, and other big data inputs.

Many or most know of Moore’s law – Digital Computing is doubling power every 2 years. Unfortunately, the computing power gained is usually lost through gluttonness software eating up all the gains at a rate doubling every 18 months (Gates, Pages, Wirth’s Law). Yet, even though those tools outpace hardware gains, each hardware gain means new architectures for handling more connections, more network, more storage, and thus new architectures.

So, think of the various data architectures out there. Variations such as logs to content to indices to relational databases to object databases to NoSQL search indices to native data stores (lakes) to graph theory databases to neural networks. And that is only for data at rest. On the other hand, for data in motion, there are all sorts of data pipeline patterns, streaming hubs, IoT edge data processing, training data cleaning engines, and more.

A Follow-Up: “How Could Your Approach to Managing Such Fast-Changing Chaos be Different Than Traditional Low Tool Change?”

So, if we focus on the latter audience, we’ll assume the answer is “we don’t like that fast of change that much” which means organization cohesion motivated to change is low That means, a clearer roadmap is likely needed to reduce the waste. With that, we’ll follow-up with this next question (of which there are many others): “How should your approach to managing such chaos be different when changing that fast?”

This is our reality. And thus, our clients. The proverbial ‘data explosion’ creates the challenge to make the data more accessible and integratable to be more dense so that APIs, analytics, and AI can take advantage of it all.

First Step We Recommend – Use a Different Portfolio Approach

Right. Moving forward. What we do is break our architecture into 3 parts (with a 4th for the future). Data, Information, Knowledge, and in the future Wisdom:

Data asks past questions “What happened?” It is simple and discrete. Also, it is predictable, and used generally for reporting.
Information asks questions of the past “Why did it happen?” combined past and present data with static data into complicated, yet organized structures capable of enterprise support of Policy and Compliance Workflow Analytics.
Knowledge asks questions yet to be thought of along the lines of the now and future “What is and will be happening?”. This adds complexity and requires structures to change and adapt without losing data fidelity. Thus requiring fluid data relationships to provide decision support modeling, analytics, and monitoring.
Wisdom solutions are all future based – “What should we do?” – that AI agent we all want to handle shrinking the confidence intervals in chaotic, uncertain, and varying patterns to be predictive based on trends too much for human or process deciphering.

The Challenges of These Data Portfolios Vary Greatly

Each area has its own challenges. Data – The data has changed greatly to handle more FAIR data principles enabling more network, security, and parallel connection access to burst or cloud computing. Information – While the information domain has the same patterns for the last 30 years, the code changes every 4 years and thus systems need to be replaced every 10 years.

Moving into the more new architectures, Knowledge- This is the most disruptive as most initial forays tried to pull data directly from MIS, and warehouse solutions supported compliance questions, yet not truly the mission questions. This due to structured data limitations and the data stored limited what questions could be asked. Most organizations are realizing this and planning out its next 5 to 10 years of investment. However, it is tricky because you ask: do you go graph, lake, semantic, streaming updates, or skip right to AI?

Wisdom – Only the leading edge and highly capitalized firms are going in this direction. Back in September, we talked about how much AI is growing as a technology and as an industry. It’s mostly research, and gold mining prospecting. Some AI is working great. However be ready to throw away the architecture to come, especially with the confluence of quantum computing coming!

This presents a few interesting conundrums for companies and organizations in the 21st Century. So many changes (or to be more accurate, growing) every two years. Consequently, we constantly adjust and update our practices and technology to properly parse and use data. We also have to face the reality that even the best supercomputers lack the capability to store all this data. The biggest of which is – to organize your groups differently to manage these ‘portfolios’.

This Thinking Provides Us Portfolio Groups of Data Architectures

Using these or similar portfolio management constructs – much like how IT Services uses TBM – helps in handling disruption as generally speaking, the portfolios create a level of disruption isolation between the solution groups. For example, a new tool in the data portfolio can isolate its changes from knowledge. Consequently adaptation is allowed to be more predictive, and planned out to handle this rapid change. This means that the groups can plan out its architectures, research, capital, and coordinate governance even separately. We should note this is also ‘lazily speaking’, as some tools do cut across multiple domains. So for a semblance of enterprise architecture, governance, and tied to mission decisions, investment percentage breakdown is key.

The groups help us size our investments knowing what size, computing, network, and complexity are needed. Subsequently, skillsets are needed in each group. Furthermore, on the skills front, like architectural patterns, they each have different needs. However, there is no industry approach for data, like there is for IT Services like TBM. TBM has a start, yet, in our opinion, is not addressing the scope of data solutions quite enough as it is approached a bit more from the IT hosting point of view vs. the instantiation of actual data tools, pipelines, services, apps, visualization, streams, etc.. Also, it assumes data solutions are run under CIO services, which is not the case in other data solutions. TBM may go to a new version soon which may address this as the current version is 2018, yet until then, most data portfolios are left wanting.

An Example of How Portfolio Groups and Architectures Could Be Grouped

Type	Current Architectures
Data (Inputs: raw feeds, simple datafiles, logs, content, simple queries in DB or warehouse	Data Pipeline Processing – ELT/ETL Data Management Pipelines, Data Microservices, Database Connectors Enterprise Data Hub Solutions – Data MicroServices Platforms, Real-Time Data Hubs, Geospatial Data Services (WMS, WFS, WCS, REST/XML Web Service APIs) Enterprise Data Catalog – Data Asset Discovery of APIs, Apps, Data, Services, Micro, Pipelines, GeoData Products/Web Services, Ontologies, Analytics Models; and Metadata Configuration Management Systems
Information (relational databases, warehouse, math, MIS, GIS, Temporal/Time-Series Data, data services, and early standards)	MIS Systems – Relational Databases, Data Product Generation (e.g. Enterprise Services Buses Electronic Data Interfaces, Workflow Platforms, Complicated and Transactional Web services Micro-Service Platform Architecture – Information Exchange Buses between systems, System Application Programming Interface needs, API to API and ESB solutions, Semantic integration services Geospatial Service Platforms – GIS supply workflows , Geospatial Data Delivery and App Platforms
Knowledge (engineering data, semantic data, time-series and GIS integration, detailed attributes and at times higher performance capacity, quality data, relationship understanding, RDF)	Data Warehouses & Beyond – Data Warehouses, Data Marts, Data Lakes Data Analytics Platform – Data Analytics Transformation Environments, BI & Dashboard Platforms, Analytics Workbenches, Intelligent Sensor feed pre-processing
Wisdom (more prediction/solution relies on natural language processing, higher order linked data, fuzzy logic, interpretive signals, artificial intelligence, sentiment analysis, and low-level atomic big data analytics)	Massive Data Training Massive streaming aggregations A/B Signal testing & acceptance/rejection lifecycle multiple-ontological and linked data solutions Training Data scheduling & Tracking Enterprise Data Analytics Platform AI & Machine Learning Platforms Scientific Workbench platforms Growing area – so much more!

To Put a Bow on This

By considering grouping your data tech investments into the DIKW groups and tagging by the architectures you consequently provide more transparency, more clarity, and no additional staff on the first iteration. Also, you can organize your data tech leadership, governance and skillsets to limit disruption. Furthermore, you can plan out individual research efforts to keep up with the changing pace. Finally, you can plan out capital and projections by each group. And you do so by having enterprise leadership tying it all together with CDO, CTO, and CIO into mission needs.