Introduction
Organisations are relentlessly moving towards increasingly sophisticated data and data management scenarios resulting in challenging data requirements, demands for complex, highly managed data sets delivered to processing and analytics data platforms, and pressure from the business to exploit the outcomes.
Data leaders must work out how to weave data into this complex environment in ways that do not upset the apple cart, against a cacophony of cloud data platform vendors seeking to drive adoption and consumption of their own cloud data offerings. Data products promise to provide a significant addition to the data leader’s toolbox.
Although early in its evolution, the Enterprise Data Management Council’s CDMC+ working group for data products is preparing guidance for data leaders that is technology agnostic and designed to work within existing CDMC constructs. One of the key insights from this working group is that the potential range of data products is wide, and we must leverage existing experience and good practices from products in general and the developing field of data products. To create a ‘factory’ that builds viable data products at scale, you need first a good framework that recognises lifecycles and supply chains.
With lifecycles and supply chains, you have data producers and consumers. At the production level, it’s helpful to think of the data product environment as an ‘exchange’. For data product consumers, there are guiding principles we should consider:
- Utility – does the product have a well understood and useful purpose?
- Quality – does the product have expected levels and dimensions of quality?
- Convenience – is the product convenient to handle and consume?
Starting with a Data Product Lifecycle Management Model that applies product thinking to a Data Asset Model (a collection of data under a particular management regime) and governs the Data Product Supply Chain Management Model, we can begin to look at how data can be supplied through a Data Product Model in a more systematic manner.
When we think of data products typically what comes to mind is the business or end-consumer need that is being satisfied, the ultimate motivation for providing and consuming data products. Getting to that end point of consumption relies on a data product’s data resources becoming a managed and processable data asset, held within a local or accessible data infrastructure. Highly manual. Not convenient. Not scalable.
Perhaps we can consider data products as a candidate to help scale end-consumption of data products. Consider 3 levels for data products:
- Consumed Data Products – Data forming part of the overall data asset, supporting a requirement and is moved/shared. It could be a data set, an analytics product, a semantic graph data product or even a data asset structure.
- Data Asset Products – A structure into which consumed data products are inserted, such as a component of a data mesh, data lake or data warehouse. This category may change as your perspective on how data products are consumed within your organisation evolves.
- Data Infrastructure Products – Data products that specify how to build the data infrastructure that delivers a data product. It might be a configuration spec conforming to a metamodel of data supply chain components (think ‘last mile’ builds of data products or those consumed via a marketplace).
In the Data Product Model, a data product is specified through collections of metadata wrapping a data asset. The model clearly separates the “consumed” data resource from the “produced” data set, in that metadata is appended to support its movement and consumption throughout the supply chain, which in turn evolves the data product. For example, the producer creates the data product itself which may just contain metadata objects, but there is no context yet. Additional production metadata can be added such as status, quality or other information.
The key things to remember in this abstraction of a data product are:
- A data product evolves. it moves through steps in a process from production to consumption, and each time it moves through a step, it’s helpful to have context. Metadata added & appended to state changes, design changes, or even simple sizing of the dataset and who owns it makes tracing the components of the data product easy.
- A data product is dynamic and cannot live in isolation. Even if the design of a data product is static, the very nature of a data product is that the output can change if the data asset changes. One of the big issues that practitioners miss is knowing where, when and who created a blocker in the supply chain – that is why a standardised metadata model for data products makes observability and bug fixing possible.
Understanding the data product supply chain
Stage 1: Ideate, Define, Develop
At the consumer level, where demand for data products originate, there needs to be a clear and dependable process for ideation or the expression of demand, and a clear process for defining and developing a data product that sits within the governance guardrails. The definitions are tested for fit (commercial, consumer, market, etc.) and then made public with a baseline.
Like you, dear data leader, we love metadata as much as the underlying data assets, so ensure that you are thinking about how your data product will convey its benefits, delight consumers and how its usage will be articulated, reported and acted upon.
Remember the Steve Fisher Axiom: “if you don’t know how your data product is being used, you might as well go and put the kettle on and meditate upon your life choices”.
Are you ready to commit? The data product idea backed up by the business case leads to designs, development, testing of the product along with the means to build and support. It’s a good thing to be sure at this stage.
In software development, the common vernacular is a ‘Product Requirements Document’ (PRD) which, in this case, maybe templated as a Data Product Requirements Document. Make sure your Data PRD defines the metadata that will be used to manage your data product!
We now have the data product ready to roll out, waiting to be assembled.
Stage 2: Produce, Procure, Bind
The defined product is made available for consumption, but that doesn’t necessarily mean there is any actual data available at this stage. What is produced could be a model or other definitional metadata that allows a third party to produce a consistent and standardised dataset.
There are a few more processes that the data product should go through before it passes muster, namely, that there should be data within the product that has a clear owner responsible for its quality and reliability. In short, a good data product should give consumers answers that support business decisions; it should be a reliable and trusted user experience.
Once a data product exists, it can be ready for use by a consumer via the appropriate ‘exchange’ mechanism and credentials for authentication. It is a worthwhile investment to include some testing to ensure that a complex backend does not hinder a simple front end user experience. Nothing kills adoption faster than a terrible end user / consumer experience.
A data product may not be native to the underlying data landscape. It must therefore be bound to the local data infrastructure which is often a last-mile step of the delivery chain. A good way of approaching how to bind a data product to local data infrastructure, and report on the status of how a data product is moving across the supply chain, is to think about buying your favourite can of beans in the local corner store, knowing that you have a can opener at the ready indoors.
Use Case for Context: Reconciling Transactions with a specific data product
Imagine a scenario where you have forecasted and actual financial transactions that require reconciliation. Within the data product is a data asset comprising forecasted financial data from Anaplan and actual financial transactional data from SAP. Your data product in this scenario allows a financial analyst to search, match and validate the forecast with the real transaction, and if invalidated, analyse by how much the forecast was out for that period.
In this circumstance, the data product is not read only. An action by the user / consumer may write changes back to the underlying data asset or update a data model with new parameters.
This scenario is based on a real-life example, but where it got complicated was where a third party carried out this action on behalf of the organisation who owns the data product (an outsourced management accounting firm). The data from Anaplan and SAP needed to be ingested into a cloud data warehouse, and then a secondary ETL pipeline was run to the data product which enabled users / consumers to be authenticated, logged in and carry out the reconciliation process. The state of that model was continuously revised and had to write data back to the data warehouse so a second data product, a dashboard, could be updated.
Simply put, the data product could not survive without being tightly bound to the local data environment. It needed to be integrated. It needed to be owned, maintained, and improved.
Now, the production lines are built, a data resource is ready to be consumed and applied to a data value use case.
Stage 3: Consume, Organise, Act
Now the data is bound to the data infrastructure and the systems of reporting and governance, it can be processed and handled as any other asset within the organisation. Additional metadata can provide instructions for use, support the monitoring of data integrity and the overall health of the data product and how information can be shared. Some forms of information sharing can be standardised and the “product support” capability / function can begin to mature. At its core, this is a framework for understanding the data product supply chain and it is up to you to establish your own product management lifecycle model and data product standards to deliver the next evolution of data valorisation within your organisation. As David Brandt, an Ohio farmer turned internet meme sensation rightly pointed out, “It ain’t much but it’s honest work.” And with that spirit, as pragmatic practitioners tempered by geeky enthusiasm for the future, we look forward to supporting your journey to success in data products and enterprise data management.
After use, are the data resources or the product fit for purpose? Time to re-start the cycle!
Next steps
If you missed our live event in October, you can watch ‘Demystifying Data Products’ on-demand. This article contains a lot of work-in-progress thinking by the CDMC+ working group on data products and as such is subject to change without notice. If you’re curious about the latest in best practices for data products or data product management, do get in touch.
contentS
- Introduction
- Understanding the data product supply chain
- Next steps
Webinar
We discussed this topic – and much more! – with our excellent guest speakers in the 4th online event of our Data Leadership Series. Catch up now!