If the last 20 years of the 2000’s era have given us anything is a diverse and (at times) confusing array of data platforms and solutions. From on-prem databases to Hadoop to many technologies in the cloud that I couldn’t list here if I tried. At this point, I probably learn about 1 or 2 new ones a week without researching. The strategic challenge that ensues is what platform best enables your digital transformation culture? Do you even need a platform? If you’re under the gun like many other companies, you’re probably thinking about transformations and if your data strategy needs a refresh. Like many of the great answers in life, it depends. This article will break down some common platforms in use today and apply those strategies to the situation.
There are several types of platforms out there; warehouse, lake, lakehouse, and mesh are among the most common. Each comes with benefits and drawbacks. There are also other types of design patterns for data platforms such as fabrics, factories, and others. A few key concepts are found across the platforms that connect to the organization’s strategy. For instance, Monolithic vs. distributed architectures are an essential characteristic that maps back to the goals of the enterprise. For example, if the enterprise strategy is offensive data strategies, empowering product teams, mesh or fabrics provides a better means of doing than a warehouse or lake technology. At the same time, the tradeoff with governance and risk strategies becomes important as well. Distributed data architectures like a mesh distribute everything. Including how the data has to be governed means everyone has to help steer the ship. There are Multiple Versions of Truth (MVoT) versus a monolithic control model like a singular data warehouse which encourages a Single Source of Truth (SSoT).
When looking at data strategies for platforms, you may find that multiple strategies are needed. For instance, sometimes data, like financial data, need to use a warehouse strategy. While some aspects of product data and user interaction may work better where the data is stored and used closer to the product, but with access and integrations back to other data channels. This is similar to the data mart concepts of the early 2000s where data was processed and stored in different data marts that were rolled up into cubes. I’m sure we all miss explaining to the C-Suite why OLAP versus OLTP was important to consider with data strategies at the time. Lakehouses are an exciting hybridization of a monolithic structure that uses both concepts from warehouse and lake architecture. The pattern of hybridization continues as data platforms change and evolve. Check out the links at the bottom of this article for notes on the history of data warehouses and future trends.
Quick Notes When Re-Thinking Your Data Platform
Data Warehouse
- Primary Architecture: Monolithic
- Shape: Structured
- Properties: Standardized Structures, Easy to Train Business, SSoT, Limited Information Due to Time to Build Data Structures, Wrong Context or Structure for Analysis Types, Difficult for Agile Response
- Strategic Alignment: Defensive
- Learn More: WhichDWArchitectureIsMostSucce-with-cover-page-v2.pdf (d1wqtxts1xzle7.cloudfront.net)
Data Lake
-Primary Architecture: Monolithic
-Shape: Structured or Unstructured, Flat
-Properties: Single Giant Source of Data, Agile and Flexible for Questions, No Predefined Structures to Store Data, Business Can Collect Now and Define Usage Later, MVoT, Bad Governance can Result in a Data Swamp. Unable to Curate for Specific Purposes, Data Quality is Analyzed Sometimes, Schema on Read
-Strategic Alignment: Balanced, but leaning offensive
-Learn More: International Journal of Scientific Research in Computer Science, Engineering and Information Technology (researchgate.net)
Data Lakehouse
- Primary Architecture: Monolithic
- Shape: Structured or Unstructured
- Properties: Combines Data Lake and Warehouse Capabilities, Operates on Metadata Layers on top of Lakes, Optimized for Data - Science, Can Support ACID Transactions, New Architecture Still Maturing in the Market
- Strategic Alignment: Balanced
- Learn More: What is a Data Lakehouse? - Databricks
Data Mesh
- Primary Architecture: Distributed
- Shape: Structured or Unstructured
- Properties: Utilizes Domain Driven Design, Data-as-a-Product Focus, Self-service Oriented, Experimentation Oriented, To be viable needs a large team and large amount of data, Requires Distributed Governance, Highly Complex, Empowers the Business as Owners Rather than IT-Controlled
- Strategic Alignment: Offensive
- Learn More: What is a Data Mesh — and How Not to Mesh it Up | by Barr Moses | Towards Data Science
Other References:
What is a Data Lakehouse? - Databricks
A Brief History of the Data Warehouse - DATAVERSITY
(6) Data Mesh: Design, Benefits, Hype, and Reality | LinkedIn