Your expert for questions
Martin Whyte
Partner in the Data & Analytics division at PwC Germany
Email
Data provides the basis for business process automation, AI solutions and data-driven business models. The more data-driven use cases are implemented across an enterprise, the more the weaknesses of purely central data platform approaches such as data warehouse or data lake emerge. Their scalability is limited due to organizational bottlenecks caused by central platform organizations. No wonder that so many companies are currently rethinking their data management architecture and reorganizing their data engineering and data science teams. One new but very promising paradigm is the so-called data mesh. This combines decentralized data engineering with central governance and platform components.
“To make better use of their data, companies need to apply new technical and organizational principles for data management. This requires a strategy which also takes into account the change impact on the entire organization.”
The technical infrastructure and organizational approaches for handling data are entering the next stage in their evolution. Since the 1990s, companies have held data for analytical purposes in central database platforms, known as data warehouses. The concept of ‘data lakes’ emerged as a response to the phenomenon of rapidly growing databases. This is designed to provide storage and processing for any data and is often operated in parallel to a data warehouse, sometimes also integrated as a data lakehouse.
The growing number of promising use cases for data-driven solutions highlights the weaknesses of centralized approaches. They cannot cope with the growing number of use cases across various domains.
To enable the creation of higher business value from data, data & analytics leaders aim to reduce bottlenecks and decentralize their data platform and teams. Three principles motivate the current evolution of the data & analytics area:
The data mesh addresses these requirements with a new architectural paradigm for data platforms. Only 2 percent of surveyed IT managers do not anticipate any benefits from implementing a Data Mesh.
Almost 70 percent of those surveyed expect the concept to change their company’s data architecture and technology. Nevertheless, only one third also expect the working culture to change – although the organization is a key aspect of the data mesh concept.
Large companies assume a pioneering role with respect to the introduction of a data mesh. 60% of companies with more than 1,000 employees have already at least partially established data mesh principles. For smaller companies, the share only stands at 34%. One in ten companies has already internally discussed the concept, but does not currently plan to implement it.
There is also a need for improvement in the self-service area: Around 90 percent of companies do have specialized data analysis teams to serve the business. However, only 8 percent offer all employees the opportunity to conduct self-service data analyses by themselves.
Data is owned, processed, and used decentrally in domains
The company defines data domains that are responsible for data objects in their areas. Each domain has a data architect.
Each domain creates and shares valuable data products across the enterprise
Data products are developed once and shared within the entire organization to be re-used by many people. They comply with data integration and quality standards and are easily usable.
Data products are created and operated by autonomous, interdisciplinary teams
Behind each data product is a product team in the data domain. It consists of a product owner, business process experts, data engineers and, if required, data scientists.
Data product teams have professional DataOps practices
Data product teams use DataOps – a set of practices that ensure efficient data operations and high-quality data products. DataOps applies agile and DevOps principles to data engineering & analytics.
All metadata about data and use cases is managed in a central data catalog
The company sets up a central data catalog to create visibility for data products as well as raw data. It also forms the technical backbone for a data marketplace and a virtual data integration layer.
Data products can easily be found and accessed through a data marketplace
A central data marketplace makes it easier for users to find and use data products. The marketplace works similarly to an online shop and, in addition to a search function, also includes a recommendation system.
Access to data products is mostly virtualised
To mask the technical complexity of the integrated distributed data products from users, the central data platform has a partially virtualized integration layer. It makes the locally managed data products accessible from the central data marketplace.
Data governance rules are centrally enforced for all data products
The organization establishes the role of a data governance officer, who defines data guidelines and aligns them with the data domain architects and data product owners.
The introduction of a data mesh does not only lead to changes to the data architecture and technology. It has impacts on the entire organization and should therefore not just come from the IT department, but rather should also be driven by the business divisions. In order to set clear goals, companies should develop a data mesh strategy as a first step. The second step should only begin if the corporate management clearly commits to the strategy and it has been communicated within the organization.
The second step involves defining data domains for individual business segments and embedding responsibility for the data in the domains. An agile approach is recommended to allow successive expansion with additional domains. Provision of the first data products will allow experience to be gained about the composition of interdisciplinary product teams. A central platform team should create a data catalog in this period and implement the first automated governance rules.
In a third step, additional DataOps practices should gradually find their way into the work of the data product teams. They should increase the efficiency of the teams and successively improve the quality of the data products. Building on the data catalog, metadata management should be expanded in such a way that it automatically recognizes new patterns and relationships between data points. The objective is to offer data users an ever-better integrated data landscape that contributes to decision-making in the operating business.
In order to increase the use and re-use of data products, companies should introduce a marketplace for data in a fourth step. This should enable users to intuitively browse and identify the relevant data products and access them. While the data marketplace is initially responsible for the convenient internal provision of data products, prospectively it can also be opened up for external partners in order to share and monetize data. Central data repositories can be further reduced by virtualizing access to data products.
“The decentralization of data platforms is an important step to uplift data & analytics capabilities within an enterprise and make greater use of data.”
The study is based on an online survey conducted by PwC Germany between December 2021 and January 2022 in conjunction with Statista. A total of 152 IT managers took part from various industries, whose companies have at least 500 employees in Germany. The survey comprised multiple choice questions addressing the maturity of data & analytics in companies and the application of a data mesh.