You can run the CREATE TABLE command to create a table. When creating a table, you can define the following information:
Example: Use CREATE TABLE to create a table web_returns_p1, use wr_item_sk as the distribution key, and sets the range distribution function through wr_returned_date_sk.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | CREATE TABLE web_returns_p1 ( wr_returned_date_sk integer, wr_returned_time_sk integer, wr_item_sk integer NOT NULL, wr_refunded_customer_sk integer ) WITH (orientation = column) DISTRIBUTE BY HASH (wr_item_sk) PARTITION BY RANGE(wr_returned_date_sk) ( PARTITION p2019 START(20191231) END(20221231) EVERY(10000), PARTITION p0 END(maxvalue) ); |
You can define constraints on columns and tables to restrict data in a table. However, there are the following restrictions:
Constraint |
Description |
Example |
||
---|---|---|---|---|
Check constraint |
A CHECK constraint allows you to specify that values in a specific column must satisfy a Boolean (true) expression. |
Create the products table. The price column must be positive.
|
||
NOT NULL constraint |
A NOT NULL constraint specifies that a column cannot have null values. A non-null constraint is always written as a column constraint. |
Create the products table. The values of product_no and name cannot be null.
|
||
UNIQUE constraint |
A UNIQUE constraint specifies that the values in a column or a group of columns are all unique. If DISTRIBUTE BY REPLICATION is not specified, the column table that contains only unique values must contain distribution columns. |
Create the products table. The values of product_no must be unique.
|
||
Primary key constraint |
A primary key constraint is the combination of a UNIQUE constraint and a NOT NULL constraint. If DISTRIBUTE BY REPLICATION is not specified, the column set with a primary key constraint must contain distributed columns. If a table has a primary key, the column (or group of columns) of the primary key is selected as the distribution keys of the table by default. |
Create the products table. The primary key constraint is product_no.
|
||
Partial cluster key |
Partial cluster key can minimize or maximize sparse indexes to quickly filter base tables. Partial cluster key can specify multiple columns, but you are advised to specify no more than two columns. |
Create the products table with PCK set to product_no:
|
The roundrobin distribution mode is supported only by cluster version 8.1.2 or later.
Policy |
Description |
Scenario |
Advantages/Disadvantages |
---|---|---|---|
Replication |
Full data in a table is stored on each DN in the cluster. |
Small tables and dimension tables |
|
Hash |
Table data is distributed on all DNs in the cluster. |
Fact tables containing a large amount of data |
|
Polling (Round-robin) |
Each row in the table is sent to each DN in turn. Data can be evenly distributed on each DN. |
Fact tables that contain a large amount of data and cannot find a proper distribution column in hash mode |
|
Selecting a Distribution Key
If the hash distribution mode is used, a distribution key must be specified for the user table. When a record is inserted, the system hashes it based on the distribution key and then stores it on the corresponding DN.
Select a hash distribution key based on the following principles:
For a hash table, an inappropriate distribution key may cause data skew or poor I/O performance on certain DNs. Therefore, you need to check the table to ensure that data is evenly distributed on each DN. You can run the following SQL statements to check for data skew:
1 2 3 4 5 | select xc_node_id, count(1) from tablename group by xc_node_id order by xc_node_id desc; |
xc_node_id corresponds to a DN. Generally, over 5% difference between the amount of data on different DNs is regarded as data skew. If the difference is over 10%, choose another distribution key.
1 | SELECT * FROM pg_tables; |
1 | \d+ customer_t1; |
1 | SELECT count(*) FROM customer_t1; |
1 | SELECT * FROM customer_t1; |
1 | SELECT c_customer_sk FROM customer_t1; |
1 | SELECT DISTINCT( c_customer_sk ) FROM customer_t1; |
1 | SELECT * FROM customer_t1 WHERE c_customer_sk = 3869; |
1 | SELECT * FROM customer_t1 ORDER BY c_customer_sk; |
Exercise caution when running the DROP TABLE and TRUNCATE TABLE statements. After a table is deleted, data cannot be restored.
1 | DROP TABLE customer_t1; |
1 | TRUNCATE TABLE customer_t1; |
1 | DELETE FROM customer_t1; |
1 | DELETE FROM customer_t1 WHERE c_customer_sk = 3869; |