The Various Roles That Act as Both a Checkpoint and a Helper Throughout the Lifecycle

In the field of data analysis and consulting, there are a lot of different jobs that are found along what is known as the “data lifecycle. The data lifecycle is a representation of all of the stages of data throughout its life from its creation to its dissemination. Today, we’re here to discuss how various staff roles are utilized in a typical data lifecycle, and which are the most important roles.

The table below lists the major staffing roles that are found throughout the lifecycle, to include the job title, role type (mission portfolio, data portfolio and data technology expert), and a brief description of each particular role.

Staffing Roles


Mission Portfolio Roles

Sponsor, Mission Subject Matter Expert (SME)They know the questions they want to ask, what they do today, and what they want to do when next
Data Leadership RoleThis is the Representative of the Data Organization once the organization is made into a formal division or function. This includes responsibility over governance, policy, and data management leadership. Depending on maturity, may be a Chief Data officer, Data Management Lead, or Data Management Working Group Rotating Lead
Business AnalystSupports the Sponsor in documenting and detailing the questions being asked.  Helps identify data sources, SMEs and supporting material.
Data Catalog ExpertThey know the data they want to use and general assessment of readiness and who to talk to.  Understand the metadata that needs to be present in order to make data usable and user-friendly.

Data Portfolio Roles

Data SuppliersThey provide the data and know their source
Data AnalystThey provide the modeling analysis of the supply of the objects, fields, values, rules, and qualities need 
Data EngineerThey provide the ELT Data Pipeline to maintain the pipeline and lake for the supplies They also automate the supply of the data.
Data ArchitectDesigns the data system, outlines the dataflow and defines and designs how and where the different roles utilize the system.
Data ScientistThey know how to translate via relational or AI/ML script the reference, temporal, financial, and/or geospatial for integrating data across supplies within the pipeline prior to being available to the enterprise.  Understand methodology and have a toolchest of AI resources that can convert data into knowledge.
Data DeveloperThey can make the final end transformations to make the end product – a service, product, map, app, API, etc. They also automate the delivery of the data products.

Data Technology Expert Roles

Data Solution ArchitectSupports the requirements definition and leads the solution design analysis and recommendation for various platforms, flows, technology stacks, and planning from MVP through features through Architecture qualities
Platform / App DeveloperDevelop end user facing tools for interaction with the data platform and/or create data services for accessing the data.
Data Wrangler – Handles the likely large amounts of data quality issues working for the data developer and with Design roles to address data quality issues with the design solution assumptions and supplier. Usually requires strong transformation scripting skills.
Cloud DevOps – Initially, they design the cloud environment and deployment needed to facilitate the supply, developer, and user flows to meet service level expectations.  Infrastructure as Code.
Data Administrator – They maintain data environment administration of the lake, cube, fabric, warehouse, etc. including performance and data lifecycle service management monitoring. Data platform or data system administrator.
Cloud DevOps Administrator – They provide the managed cloud service environment security, IT Compliance, and service levels agreements.

So What Roles Help Out The Different Parts  of Data Lifecycle Management The Most?

In order to answer that question, first we need to understand that there are typically three key phases of the data lifecycle: 

  • Planning – Planning focuses on Customer Relationship Management for the Needs and Requirements Lifecycle, Cost Benefits Analysis of data assets, Source Data Acquisition and Quality Planning and Funding Planning.
  • Production and Data Lifecycle Management – Process Evaluation and Improvements, establishing improved automation in data pipelines, ETL, ELT, and maintaining the end to end stewardship of the data (define, inventory/evaluate, obtain, access, maintain, use/evaluate, and archive). First and foremost, this group extracts source data, load and executes transforms to support delivery’s goals.
  • Service Delivery – Create data derivatives for delivery (downloads, services, packages, composite products), architect and support service platforms APIs,  manage infrastructure to be FAIR, improve discovery, encourage and enable community and collaboration.

To further assign responsibility to each major role within these three parts of the data lifecycle, consider the concept of RASCI (Responsible, Accountable, Supports, Consults, Informs). A RASCI is a responsibility assignment matrix that clarifies the responsibilities of a particular role during preparation and implementation. Furthermore, the RASCI table below groups each role based on responsibility (RA being primary, SCI being support-based). RA (If the table is filled with an R, an A or both) means the job falls into  a leadership role while SCI (if the table is filled with either an S, C, I, or any sort of combination) means the job is more a knowledge/support role. You can look back at the beginning of this paragraph where we list the five parts of RASCI for reference.

The graph below the table further categorizes roles by which part of the data lifecycle it falls under

TitlePlanningProduction OperationsDelivery Services
Mission Portfolio Roles
Sponsor, Mission SMERA
Data Mgmt Leadership RoleC
Business AnalystSC
Data Catalog ExpertSCI
Data Portfolio Roles
Data SuppliersCI
Data AnalystCI
Data EngineerRA
Data ArchitectSCCISC
Data ScientistSCI
Data DeveloperSCIS
Technology Portfolio Roles
Data Solution ArchitectCIRA
Platform/App DeveloperSCI
Data WranglerS
Cloud DevOpsSCI
Data AdministratorS
Cloud DevOps AdministratorS

The Takeaway

Note the four of the roles in the “mission portfolio section” are involved in the planning stage. Some would have the opinion that without roles being properly played to plan the lifecycle of data, the entire operation would fall apart.  Next we have “data portfolio roles”, which are far more strewn about than the former “group” of roles. However, a good portion of them are involved in S, C, I or some combination of the three. Finally, we have “technology portfolio roles”, which are found more in production operations and delivery services. Also, a good portion of them are involved in S, C, and I, much like the aforementioned  data portfolio roles.

Following the logic that the more parts of a lifecycle a role can help out in (quantity), it would be easy to say that Data Architects are the most important role in the data lifecycle because their responsibilities are spread across all three parts of the data lifecycle. Between this, and the paragraph just before the graph, it’s ultimately a matter of opinion, so look back at the first table, and decide for yourself who helps out the most.

Okay, But What Roles Help Out the Most During The Project’s ‘Data Lifecycle’?

We are now taking a look at the question of who helps out the most in the data lifecycle from a  completely new perspective. At Xentity, we closely follow the concept of Discovery-Define-Design-Develop-Implement-Maintain. So, we will now re examine the roles once more with the same rules. In the table below, blue defines a leadership role and green represents a support role. In the section that follows  we will examine another graph that questions whether you can say which role helps out the most from a quantitative perspective.

Quantitative Perspective

Mission Portfolio Roles
Sponsor, Mission SMEMission Mission 
Data Mgmt Leadership RolexxMissionMission
Business AnalystxTechxx
Data Catalog Expertxxx
Data Portfolio Roles
Data Suppliersxxx
Data Analystxxx
Data ArchitectTechxMission 
Data Scientistxxx
Data Developerxxxx
Technology Portfolio Roles
Data Solution ArchitectxTechMission 
Platform/App DeveloperxxTech
Data Wranglerxxx
Cloud DevOpsxxTech
Data AdministratorxData Tech 
Cloud DevOps AdministratorxxIT Tech

The Takeaway

So, following more of a quantitative perspective the roles of Data Management Leadership, Business Analysts and Data Developers arguably help the data lifecycle the most because they have roles in 4 out of 6 of the development/implementation process we at Xentity tend to follow so closely. Furthermore, like in the previous graph, you see that there are certain roles that are far more involved throughout the lifecycle. By contrast, if you want to follow the idea that success starts at the top, you could argue (based on the table’s data alone) that Data Management Leadership is the most helpful because not only does it have four points in the development process where it is involved in, but it also has two leadership roles in the process.

But, much like the results of the previous graph, it becomes a question of: your mileage may vary (your experience might be different). So, we encourage you to look at the first table once more. Using the graph and reexamining the description of each role, decide for yourself which role you think helps the data lifecycle the most.

Each Role Matters

A lot actually goes into supporting the data lifecycle, from beginning to end. Whether it is your Mission Portfolio Roles (Which determines what kind of data is needed and what questions need to be asked), Data Portfolio Roles (Provide ETL Pipelines and microservices), or the Data Technology Experts (Providing data platforms and  design environments for data), you can make the argument that some roles help the data lifecycle more.  Especially if you look at how often they can be used from a quantitative perspective.  But make no mistake: each role is useful in the life cycle.

At Xentity we love data, and have often involved ourselves in various points in the data lifecycle because of that love for data.