by CData Software | May 20, 2022

Data Mesh: The Comprehensive Overview

cdata virtuality

What is data mesh?

Data mesh is a socio-technical data management paradigm proposed by Zhamak Dehghani. It challenges the traditional monolithic data architectures that were rather technology-focused with a business-centric decentralizing approach to ultimately make the data usable in companies. Data Mesh confronts the current enterprise data architecture to:

  • Make a large number of sources fully manageable
  • Enable a diverse set of consumers

Zhamak Dehghani, who coined the term data mesh, defines this concept as follows: “A data mesh considers domains as a first-class concern, applies platform thinking to create self-serve data infrastructure, treats data as a product, and introduces a federated and computational model of data governance.“

Why data mesh evolved

The efforts around data mesh aren’t new. Like the concepts data fabric, logical data warehouse, data lakehouse, etc., the ultimate goal of data mesh is to make the data accessible and usable for all users, especially the business side.

So, what makes data mesh different?

  • Mindset shift of data to be an asset that is treasured to a product that is used
  • Data strategy has changed from predominantly analytic visualization to data in motion and real-time solutions
  • Philosophical shift from technology-driven to technology-agnostic and rather user-centric approach
  • Leverage the power of community by building on subject matter experience of the different areas

Big drivers of data mesh are communities, which are effective in creating new data and partner ecosystems and spreading the word about this new concept. The enablements of data sharing, insight sharing, co-development of models, data-driven products, and services are all elements that laid the foundation of Data Mesh.

What creates quite a confusion about this concept is the fact that data mesh drifts away from technology. It is often misunderstood that technology can be neglected. But to enable data mesh, discussions about technology cannot be ignored. Only with the right data integration, management, and governance tools can an organization enable a user-centric approach to data.

How data mesh works

To enable the paradigm shift of moving beyond the traditional monolithic approaches, the data mesh is based on four key pillars.

Data Mesh

Domain-oriented decentralized data ownership and architecture

  • Decentralization and distribution of responsibility to the domains
  • Data should reside and be managed by the people who are closest to it as they are most familiar with it
  • Users can then make the adjustments in the most efficient way

Data as a product

  • Perspective needs to change from data being an asset to data being a product that can be discovered, understood, and trusted
  • Data consumers should be treated as its customers
  • Domain owners need to provide data that is of high quality, trustworthy, and short time-to-market to satisfy the customers

Self-service data infrastructure as a platform

  • Platform needs to enable domains in a self-service manner
  • Capabilities for data integration and transformation, implementation of security policies, data lineage, and identity management

Federated computational governance

  • Governance model with global standardization and harmonization is required
  • Standardization efforts can address concerns in regards to data semantic/syntax modeling, metadata formatting, identification management, etc.

Data mesh vs. data fabric

“Data mesh is the approach; data fabric is the platform.” Unlike the public voices that imply contradiction between data fabric and data mesh, these two concepts actually lead in the same direction.

  • Data mesh is more philosophically and organizationally oriented, asking, “What’s in it for me?”
  • Data fabric is driven by the technical side, reflecting on the questions, “How do I build this, and which components do I need?”

Companies often start by deploying a data fabric and get some value from that first. With the priority on the flow and pathways of data, the elements of data integration, schema governance, and types of pipeline capabilities (ETL/ELT, data virtualization, micro-services, APIs, streaming, etc.) in the context of SLAs, etc. is defined in data fabric. In the next step, the business outcomes, and time to value are added with data mesh.

“A data fabric is a technology-enabled implementation capable of many outputs, only one of which is data products. A data mesh is a solution architecture for the specific goal of building business-focused data products.”

If you want to better understand the differences and similarities of data mesh and data fabric, watch this video:


Main advantages of data mesh

The data mesh concept differs from others, as it is technology-agnostic and focuses more on the human/socio part of the data management challenges. The data mesh framework claims that the previous concepts concentrated too much on the technology and thereby failed to fully understand and address the needs of the business that ultimately uses the data for insights.

With the decentralized approach, the data mesh bridges the gap between the business needs and the technology. On the technical side, it recognizes and respects the distributed nature and topology of the data and the different use cases that it can enable. On the human side, it looks at the individual personas of data consumers, their diverse access patterns, and their domain-specific knowledge.

Pillar

Pro

Con

Domain-oriented decentralized data ownership and architecture

Removes the bottleneck of centralized infrastructures: a separate entity that takes care of all tasks related to data management isn’t needed, e.g., data scientists looking for data in a data lake environment.

A decentralized approach can lead to an increasing number of data silos and suffering data quality and a decrease in data quality
=> data needs to be treated differently.

Data as a product

This is mainly a philosophical shift, so no technology has to be bought for this.

Changing people’s approach to data takes time.

Self-service data infrastructure as a platform

Business becomes less dependent on IT and, therefore, is more agile.

Managing data infrastructures is complex and requires special skills which won’t exist in all domains. In order to still enable the different domains.

Federated computational governance

Ensure that rules and regulations are adhered to, and the company is compliant.

Difficult to ensure a healthy and interoperable ecosystem in this decentralized set up.


This framework surely has a lot of potentials. However, it is still in an early emerging stage with very little real-life implementations. Many weaknesses and challenges are still unknown and unclear. It will be interesting to see how it evolves in the next few years!

Use cases enabled by Data Mesh

All data-related use cases can benefit. But when you take a closer look, use cases with analytical and transactional systems in which digital systems are embedded with intelligence are most favorable to Data Mesh. Below, you can find an exemplary list of use cases:

  • Customer 360: More automated processes providing better personalized and contextualized customer experience. Results are reduced average handling time, increased first contact resolution, and improved customer satisfaction.
  • Marketing: Marketing teams are enabled to run the targeted campaigns to the right customer, at the right time, and via the right channels.
  • Data privacy: Customer data can be protected by complying with the ever-emerging regional data privacy laws, like GDPR. Security rules can be easily applied through the integration of external data governance, policy, and security tools (such as Collibra) on the global level prior to making it available to data consumers in the business domains.
  • IoT device monitoring: Insights into the device usage patterns help to continually improve product adoption and profitability.
  • AI and machine learning training: Machine learning (ML) and artificial intelligence (AI) models can be easily fed with data from different sources to help them learn, without running the data through a central place.
  • IT and DevOps: Data latency can be reduced by providing instant access to query data from proximate geographies without access limitations.
  • Loss prevention: The domain-oriented decentralization allows to analyze the data and run fraudulent behavior models on a local level and thereby, detect and prevent fraud in real-time.
  • M&A: The combination of decentralized data ownership and federated governance allows data sovereignty and data residency on a regional level while complying with data governance rules on the global level.

Interested in seeing how you can get started?

Get a free trial to learn how you can enable data mesh with CData Virtuality.