Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Lu, Huayi <luhuayi@huawei.com> Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
8.2 KiB
What Are the Differences Between a Data Warehouse and the Hadoop Big Data Platform?
The Hadoop big data platform can be regarded as a next-generation data warehousing system. It has the characteristics of modern data warehouses and is widely used by enterprises. Because of the scalability of MPP, the MPP-based data warehousing system is sometimes classified as a big data platform.
However, data warehouses greatly differ from the Hadoop platform in function and user experience in different scenarios. For details, see the following table.
Feature |
Hadoop |
Data Warehouse |
---|---|---|
Number of compute nodes |
1000s |
Max 256 |
Data volume |
Over 10 PB |
Max 10 PB |
Data type |
Relational, semi-relational, unstructured (voice, images, and video) |
Relational only |
Latency |
Medium to high |
Low |
Application ecosystem |
Innovative/AI |
Traditional/BI |
Application development API |
SQL and other programming language APIs, such as MapReduce |
Standard database SQL |
Scalability |
Unlimited, with comprehensive programming APIs |
Limited, supported by UDFs |
Transaction support |
Limited |
Comprehensive |
Data warehouses and the Hadoop platform work together in different scenarios. GaussDB(DWS) on the public cloud can seamlessly integrate with Hadoop-based MRS on the public cloud to provide the SQL-over-Hadoop data sharing across platforms and services. GaussDB(DWS) serves as a data warehouse for managing massive data while relishing the openness, convenience, and innovation of the Hadoop platform. You can also enjoy the upper-layer applications of conventional data warehouses, especially BI applications, using GaussDB(DWS).