diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739288.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739288.png new file mode 100644 index 0000000..893bfa4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739288.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739868.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739868.jpg new file mode 100644 index 0000000..c157e54 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739868.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739872.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739872.png new file mode 100644 index 0000000..ffb212d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739872.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739892.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739892.png new file mode 100644 index 0000000..7d358ca Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739892.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739896.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739896.png new file mode 100644 index 0000000..a738660 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739896.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739900.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739900.png new file mode 100644 index 0000000..bcc2564 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739900.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739904.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739904.jpg new file mode 100644 index 0000000..3215889 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739904.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739916.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739916.png new file mode 100644 index 0000000..8c38b56 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739916.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739940.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739940.png new file mode 100644 index 0000000..f022acf Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739940.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739960.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739960.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739960.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739980.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739980.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739980.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739984.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739984.png new file mode 100644 index 0000000..24fd82f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739984.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739988.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739988.png new file mode 100644 index 0000000..58a11c3 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739988.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739992.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739992.jpg new file mode 100644 index 0000000..ea8a0e2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739992.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739996.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739996.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739996.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740000.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740000.png new file mode 100644 index 0000000..88c4617 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740000.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740008.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740008.png new file mode 100644 index 0000000..12dbb63 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740008.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740028.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740028.jpg new file mode 100644 index 0000000..0764654 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740028.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740032.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740032.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740032.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740036.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740036.jpg new file mode 100644 index 0000000..6044c06 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740036.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740040.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740040.jpg new file mode 100644 index 0000000..d783ee0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740040.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740044.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740044.jpg new file mode 100644 index 0000000..d3b2d60 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740044.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740052.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740052.png new file mode 100644 index 0000000..3c0b908 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740052.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740060.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740060.png new file mode 100644 index 0000000..c85a458 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740060.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740064.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740064.jpg new file mode 100644 index 0000000..8729ad1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740064.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740072.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740072.png new file mode 100644 index 0000000..c0174a6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740072.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740080.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740080.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740080.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740088.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740088.png new file mode 100644 index 0000000..7b2fa6a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740088.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740092.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740092.png new file mode 100644 index 0000000..876e516 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740092.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740096.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740096.png new file mode 100644 index 0000000..07374dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740096.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740104.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740104.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740104.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740120.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740120.png new file mode 100644 index 0000000..8bc7244 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740120.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740132.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740132.png new file mode 100644 index 0000000..bca1dde Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740132.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740144.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740144.png new file mode 100644 index 0000000..0505c9a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740144.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740164.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740164.jpg new file mode 100644 index 0000000..f2efd5d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740164.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740208.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740208.png new file mode 100644 index 0000000..22f4804 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740208.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740212.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740212.png new file mode 100644 index 0000000..1afe031 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740212.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740220.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740220.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740220.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740224.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740224.png new file mode 100644 index 0000000..9c38945 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740224.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740228.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740228.png new file mode 100644 index 0000000..aadf63e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740228.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740252.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740252.png new file mode 100644 index 0000000..ba818e1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740252.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740256.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740256.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740256.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740268.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740268.jpg new file mode 100644 index 0000000..a6c2f84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740268.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740272.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740272.png new file mode 100644 index 0000000..9a72893 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740272.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740284.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740284.png new file mode 100644 index 0000000..7fc5141 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740284.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740288.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740288.png new file mode 100644 index 0000000..ae151b4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740288.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740292.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740292.png new file mode 100644 index 0000000..f11ef02 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740292.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740296.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740296.png new file mode 100644 index 0000000..5816e78 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740296.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740308.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740308.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740308.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740312.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740312.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740312.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740320.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740320.png new file mode 100644 index 0000000..ce2ad69 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740320.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740324.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740324.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295740324.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899840.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899840.png new file mode 100644 index 0000000..865c2bf Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899840.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899852.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899852.png new file mode 100644 index 0000000..abc6f72 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899852.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899860.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899860.png new file mode 100644 index 0000000..e1ba5ae Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899860.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899864.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899864.png new file mode 100644 index 0000000..1b7e755 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899864.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899868.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899868.jpg new file mode 100644 index 0000000..ded4813 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899868.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899872.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899872.png new file mode 100644 index 0000000..8855dae Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899872.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899880.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899880.png new file mode 100644 index 0000000..5fb38e8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899880.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899900.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899900.png new file mode 100644 index 0000000..0de0ed7 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899900.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899908.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899908.png new file mode 100644 index 0000000..58d43c6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899908.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899948.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899948.png new file mode 100644 index 0000000..81c14c5 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899948.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899960.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899960.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899960.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899964.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899964.png new file mode 100644 index 0000000..482b501 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899964.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899968.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899968.png new file mode 100644 index 0000000..07374dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899968.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899976.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899976.png new file mode 100644 index 0000000..1d361a0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899976.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899980.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899980.jpg new file mode 100644 index 0000000..f978fb0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899980.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899984.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899984.png new file mode 100644 index 0000000..de57a41 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899984.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899992.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899992.png new file mode 100644 index 0000000..1d1725e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899992.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900008.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900008.jpg new file mode 100644 index 0000000..236a26d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900008.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900012.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900012.png new file mode 100644 index 0000000..d7daddd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900012.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900016.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900016.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900016.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900028.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900028.png new file mode 100644 index 0000000..0a3a4c1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900028.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900036.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900036.png new file mode 100644 index 0000000..984fb9e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900036.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900052.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900052.png new file mode 100644 index 0000000..f833dc2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900052.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900060.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900060.png new file mode 100644 index 0000000..ee25d6e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900060.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900064.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900064.jpg new file mode 100644 index 0000000..2c7cb7b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900064.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900068.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900068.png new file mode 100644 index 0000000..8f7ce52 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900068.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900076.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900076.png new file mode 100644 index 0000000..88d40c0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900076.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900080.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900080.png new file mode 100644 index 0000000..8e8bb98 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900080.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900088.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900088.png new file mode 100644 index 0000000..07374dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900088.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900096.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900096.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900096.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900108.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900108.jpg new file mode 100644 index 0000000..e57c188 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900108.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900152.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900152.png new file mode 100644 index 0000000..b26f3ba Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900152.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900168.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900168.png new file mode 100644 index 0000000..984fb9e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900168.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900172.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900172.png new file mode 100644 index 0000000..bdf3f26 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900172.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900180.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900180.png new file mode 100644 index 0000000..88c4617 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900180.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900184.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900184.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900184.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900204.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900204.png new file mode 100644 index 0000000..e8313fc Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900204.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900208.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900208.png new file mode 100644 index 0000000..d7daddd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900208.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900212.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900212.png new file mode 100644 index 0000000..6a6b344 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900212.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900224.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900224.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900224.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900228.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900228.jpg new file mode 100644 index 0000000..2b7ef4a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900228.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900244.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900244.png new file mode 100644 index 0000000..88c4617 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900244.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900248.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900248.png new file mode 100644 index 0000000..3604e23 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900248.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900252.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900252.png new file mode 100644 index 0000000..cfb29d3 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900252.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900268.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900268.png new file mode 100644 index 0000000..58dff54 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900268.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900272.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900272.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900272.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900280.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900280.png new file mode 100644 index 0000000..9718e87 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900280.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900284.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900284.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900284.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900292.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900292.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900292.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900296.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900296.png new file mode 100644 index 0000000..50c9af8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900296.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900308.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900308.jpg new file mode 100644 index 0000000..ff6f642 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295900308.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059068.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059068.png new file mode 100644 index 0000000..58dff54 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059068.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059672.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059672.png new file mode 100644 index 0000000..d6a6c64 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059672.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059676.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059676.png new file mode 100644 index 0000000..7496b30 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059676.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059680.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059680.png new file mode 100644 index 0000000..5816e78 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059680.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059704.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059704.png new file mode 100644 index 0000000..f1197f6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059704.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059708.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059708.jpg new file mode 100644 index 0000000..a8eab51 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059708.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059712.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059712.png new file mode 100644 index 0000000..588f1a4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059712.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059720.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059720.png new file mode 100644 index 0000000..94fc9b8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059720.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059724.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059724.png new file mode 100644 index 0000000..3b6ca32 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059724.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059736.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059736.png new file mode 100644 index 0000000..2a833ab Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059736.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059740.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059740.png new file mode 100644 index 0000000..3baf4af Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059740.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059744.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059744.png new file mode 100644 index 0000000..ce2ad69 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059744.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059764.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059764.jpg new file mode 100644 index 0000000..c3bb1e4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059764.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059788.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059788.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059788.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059792.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059792.png new file mode 100644 index 0000000..9e64ef1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059792.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059796.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059796.png new file mode 100644 index 0000000..d147ee6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059796.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059804.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059804.png new file mode 100644 index 0000000..b8b7ce3 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059804.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059808.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059808.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059808.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059816.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059816.png new file mode 100644 index 0000000..e5f282b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059816.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059828.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059828.png new file mode 100644 index 0000000..1c171d9 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059828.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059832.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059832.png new file mode 100644 index 0000000..8ee10ad Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059832.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059836.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059836.png new file mode 100644 index 0000000..9c4d755 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059836.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059840.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059840.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059840.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059844.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059844.png new file mode 100644 index 0000000..a400d0c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059844.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059848.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059848.jpg new file mode 100644 index 0000000..59be1dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059848.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059852.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059852.png new file mode 100644 index 0000000..e5f282b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059852.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059856.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059856.png new file mode 100644 index 0000000..ce2ad69 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059856.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059860.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059860.png new file mode 100644 index 0000000..58a11c3 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059860.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059864.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059864.png new file mode 100644 index 0000000..0008faf Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059864.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059888.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059888.png new file mode 100644 index 0000000..5816e78 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059888.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059892.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059892.png new file mode 100644 index 0000000..35cd85c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059892.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059896.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059896.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059896.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059900.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059900.png new file mode 100644 index 0000000..231e240 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059900.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059904.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059904.png new file mode 100644 index 0000000..35f32c6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059904.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059912.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059912.png new file mode 100644 index 0000000..5816e78 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059912.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059928.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059928.jpg new file mode 100644 index 0000000..18e121c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059928.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059936.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059936.png new file mode 100644 index 0000000..6a5f7f4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059936.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059940.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059940.png new file mode 100644 index 0000000..2d147d0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059940.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059996.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059996.png new file mode 100644 index 0000000..5816e78 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059996.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060012.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060012.png new file mode 100644 index 0000000..89acea8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060012.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060016.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060016.png new file mode 100644 index 0000000..5990a01 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060016.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060020.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060020.png new file mode 100644 index 0000000..da798a5 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060020.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060024.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060024.png new file mode 100644 index 0000000..7dd8f2a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060024.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060028.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060028.png new file mode 100644 index 0000000..adef870 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060028.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060032.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060032.png new file mode 100644 index 0000000..66cf313 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060032.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060048.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060048.png new file mode 100644 index 0000000..0066d29 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060048.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060056.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060056.png new file mode 100644 index 0000000..a400d0c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060056.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060064.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060064.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060064.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060072.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060072.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060072.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060080.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060080.png new file mode 100644 index 0000000..363d432 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060080.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060084.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060084.png new file mode 100644 index 0000000..58dd96e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060084.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060088.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060088.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060088.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060112.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060112.png new file mode 100644 index 0000000..3906418 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060112.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060120.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060120.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060120.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060124.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060124.png new file mode 100644 index 0000000..22d61e1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060124.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060128.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060128.png new file mode 100644 index 0000000..feaa32b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060128.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060132.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060132.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060132.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060136.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060136.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060136.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060140.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060140.png new file mode 100644 index 0000000..85b99ad Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060140.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060148.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060148.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296060148.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219300.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219300.png new file mode 100644 index 0000000..81e9849 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219300.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219308.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219308.jpg new file mode 100644 index 0000000..a6c2f84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219308.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219312.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219312.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219312.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219324.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219324.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219324.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219332.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219332.png new file mode 100644 index 0000000..3b28ad5 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219332.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219336.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219336.png new file mode 100644 index 0000000..49f804c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219336.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219364.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219364.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219364.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219376.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219376.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219376.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219396.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219396.png new file mode 100644 index 0000000..ca99370 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219396.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219420.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219420.png new file mode 100644 index 0000000..21b2de9 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219420.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219432.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219432.jpg new file mode 100644 index 0000000..3724e86 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219432.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219440.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219440.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219440.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219448.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219448.png new file mode 100644 index 0000000..95a4c17 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219448.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219452.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219452.jpg new file mode 100644 index 0000000..632b5d8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219452.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219456.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219456.png new file mode 100644 index 0000000..0d6e90c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219456.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219464.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219464.jpg new file mode 100644 index 0000000..5b6af20 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219464.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219468.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219468.jpg new file mode 100644 index 0000000..e3ae3a2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219468.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219472.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219472.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219472.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219476.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219476.png new file mode 100644 index 0000000..e5f282b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219476.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219480.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219480.jpg new file mode 100644 index 0000000..a7834ff Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219480.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219484.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219484.jpg new file mode 100644 index 0000000..4f2665c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219484.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219496.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219496.png new file mode 100644 index 0000000..441692d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219496.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219520.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219520.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219520.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219532.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219532.png new file mode 100644 index 0000000..fb5e471 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219532.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219564.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219564.png new file mode 100644 index 0000000..fc479f2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219564.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219580.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219580.png new file mode 100644 index 0000000..b475254 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219580.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219596.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219596.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219596.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219600.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219600.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219600.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219628.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219628.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219628.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219636.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219636.png new file mode 100644 index 0000000..3baf4af Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219636.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219644.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219644.jpg new file mode 100644 index 0000000..ee938d0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219644.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219648.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219648.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219648.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219652.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219652.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219652.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219656.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219656.png new file mode 100644 index 0000000..17a060e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219656.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219676.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219676.png new file mode 100644 index 0000000..1a53295 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219676.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219680.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219680.png new file mode 100644 index 0000000..08417f1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219680.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219696.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219696.png new file mode 100644 index 0000000..15414a8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219696.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219712.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219712.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219712.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219716.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219716.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219716.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219724.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219724.jpg new file mode 100644 index 0000000..53b6776 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219724.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219728.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219728.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219728.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219736.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219736.png new file mode 100644 index 0000000..893bfa4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219736.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219740.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219740.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219740.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219748.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219748.jpg new file mode 100644 index 0000000..fdba4bb Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219748.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219752.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219752.png new file mode 100644 index 0000000..39b3bb4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219752.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219756.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219756.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219756.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219764.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219764.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219764.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219768.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219768.jpg new file mode 100644 index 0000000..3ac014d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219768.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219776.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219776.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219776.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739701.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739701.png new file mode 100644 index 0000000..a3c529f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739701.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739705.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739705.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739705.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739717.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739717.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739717.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739725.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739725.png new file mode 100644 index 0000000..284cb79 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739725.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739729.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739729.jpg new file mode 100644 index 0000000..4519259 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739729.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739733.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739733.jpg new file mode 100644 index 0000000..85b923d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739733.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739753.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739753.png new file mode 100644 index 0000000..579d63d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739753.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739761.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739761.png new file mode 100644 index 0000000..3baf4af Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739761.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739765.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739765.png new file mode 100644 index 0000000..6d349fd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739765.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739769.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739769.jpg new file mode 100644 index 0000000..fdba4bb Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739769.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739813.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739813.png new file mode 100644 index 0000000..6451685 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739813.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739817.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739817.png new file mode 100644 index 0000000..fd691d6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739817.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739821.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739821.jpg new file mode 100644 index 0000000..4f2665c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739821.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739825.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739825.jpg new file mode 100644 index 0000000..9a47c32 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739825.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739829.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739829.png new file mode 100644 index 0000000..e433406 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739829.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739837.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739837.png new file mode 100644 index 0000000..1c171d9 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739837.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739841.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739841.png new file mode 100644 index 0000000..f596e19 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739841.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739849.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739849.png new file mode 100644 index 0000000..26e4f1f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739849.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739853.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739853.png new file mode 100644 index 0000000..1c171d9 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739853.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739857.gif b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739857.gif new file mode 100644 index 0000000..1470c6b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739857.gif differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739865.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739865.png new file mode 100644 index 0000000..f1f9114 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739865.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739869.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739869.png new file mode 100644 index 0000000..99b0751 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739869.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739873.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739873.png new file mode 100644 index 0000000..08417f1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739873.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739877.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739877.jpg new file mode 100644 index 0000000..b65526a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739877.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739881.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739881.jpg new file mode 100644 index 0000000..e3ae3a2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739881.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739889.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739889.png new file mode 100644 index 0000000..9b039d5 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739889.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739893.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739893.png new file mode 100644 index 0000000..5816e78 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739893.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739897.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739897.png new file mode 100644 index 0000000..70b4b7d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739897.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739925.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739925.png new file mode 100644 index 0000000..8d7f24d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739925.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739933.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739933.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739933.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739937.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739937.jpg new file mode 100644 index 0000000..4f2665c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739937.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739953.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739953.png new file mode 100644 index 0000000..4d0c33c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739953.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739961.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739961.jpg new file mode 100644 index 0000000..da43e0f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739961.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739969.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739969.jpg new file mode 100644 index 0000000..6a6b0df Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739969.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739989.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739989.jpg new file mode 100644 index 0000000..d98ef02 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739989.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739993.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739993.png new file mode 100644 index 0000000..0cc2cb1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739993.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739997.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739997.png new file mode 100644 index 0000000..5816e78 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739997.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740001.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740001.jpg new file mode 100644 index 0000000..ea2e589 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740001.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740037.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740037.jpg new file mode 100644 index 0000000..e3ae3a2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740037.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740045.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740045.png new file mode 100644 index 0000000..ae7f920 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740045.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740049.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740049.png new file mode 100644 index 0000000..d0ab36a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740049.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740061.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740061.png new file mode 100644 index 0000000..d0c259d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740061.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740077.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740077.png new file mode 100644 index 0000000..e5f282b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740077.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740081.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740081.png new file mode 100644 index 0000000..d7daddd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740081.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740093.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740093.jpg new file mode 100644 index 0000000..c2321e0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740093.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740097.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740097.png new file mode 100644 index 0000000..88d40c0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740097.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740109.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740109.png new file mode 100644 index 0000000..f858994 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740109.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740113.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740113.png new file mode 100644 index 0000000..e433406 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740113.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740117.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740117.png new file mode 100644 index 0000000..c2e7355 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740117.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740137.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740137.jpg new file mode 100644 index 0000000..1b9a5a0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740137.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740141.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740141.png new file mode 100644 index 0000000..88c4617 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740141.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740145.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740145.png new file mode 100644 index 0000000..e1dc77e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740145.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740149.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740149.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740149.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740153.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740153.png new file mode 100644 index 0000000..88c4617 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740153.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740157.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740157.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740157.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740165.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740165.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740165.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740169.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740169.png new file mode 100644 index 0000000..54760c0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740169.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349058841.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349058841.png new file mode 100644 index 0000000..dbfd079 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349058841.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059513.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059513.png new file mode 100644 index 0000000..6316e1b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059513.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059517.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059517.png new file mode 100644 index 0000000..9c631cb Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059517.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059541.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059541.jpg new file mode 100644 index 0000000..a953036 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059541.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059549.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059549.png new file mode 100644 index 0000000..0d135b4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059549.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059569.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059569.png new file mode 100644 index 0000000..79d34d7 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059569.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059577.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059577.png new file mode 100644 index 0000000..3baf4af Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059577.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059585.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059585.png new file mode 100644 index 0000000..5383977 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059585.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059589.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059589.jpg new file mode 100644 index 0000000..4519259 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059589.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059605.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059605.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059605.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059629.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059629.png new file mode 100644 index 0000000..9ba86ca Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059629.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059637.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059637.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059637.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059641.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059641.png new file mode 100644 index 0000000..1772974 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059641.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059645.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059645.png new file mode 100644 index 0000000..3baf4af Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059645.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059649.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059649.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059649.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059657.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059657.png new file mode 100644 index 0000000..8f7fb24 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059657.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059673.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059673.png new file mode 100644 index 0000000..1c171d9 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059673.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059681.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059681.png new file mode 100644 index 0000000..88c4617 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059681.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059685.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059685.png new file mode 100644 index 0000000..1a53295 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059685.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059689.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059689.jpg new file mode 100644 index 0000000..d780caa Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059689.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059693.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059693.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059693.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059705.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059705.png new file mode 100644 index 0000000..81e9849 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059705.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059713.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059713.png new file mode 100644 index 0000000..5f0083b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059713.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059721.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059721.png new file mode 100644 index 0000000..807d780 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059721.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059729.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059729.png new file mode 100644 index 0000000..8d3d24b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059729.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059741.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059741.png new file mode 100644 index 0000000..5816e78 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059741.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059745.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059745.png new file mode 100644 index 0000000..f832929 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059745.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059749.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059749.jpg new file mode 100644 index 0000000..0be2621 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059749.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059753.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059753.jpg new file mode 100644 index 0000000..8703205 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059753.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059761.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059761.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059761.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059765.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059765.jpg new file mode 100644 index 0000000..ff6f642 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059765.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059773.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059773.png new file mode 100644 index 0000000..3baf4af Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059773.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059781.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059781.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059781.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059785.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059785.png new file mode 100644 index 0000000..e8313fc Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059785.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059809.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059809.png new file mode 100644 index 0000000..5712e6f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059809.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059825.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059825.png new file mode 100644 index 0000000..4f7c296 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059825.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059845.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059845.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059845.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059853.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059853.png new file mode 100644 index 0000000..07374dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059853.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059857.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059857.png new file mode 100644 index 0000000..1608a79 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059857.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059865.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059865.png new file mode 100644 index 0000000..19dcf20 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059865.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059869.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059869.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059869.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059873.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059873.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059873.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059877.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059877.png new file mode 100644 index 0000000..66cf313 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059877.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059881.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059881.png new file mode 100644 index 0000000..38a957a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059881.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059893.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059893.png new file mode 100644 index 0000000..e8313fc Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059893.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059901.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059901.png new file mode 100644 index 0000000..a7baeb7 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059901.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059905.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059905.jpg new file mode 100644 index 0000000..987f49c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059905.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059925.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059925.png new file mode 100644 index 0000000..03226f1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059925.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059929.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059929.png new file mode 100644 index 0000000..26ecfa1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059929.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059933.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059933.png new file mode 100644 index 0000000..e433406 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059933.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059937.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059937.png new file mode 100644 index 0000000..06737e1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059937.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059949.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059949.png new file mode 100644 index 0000000..3baf4af Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059949.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059961.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059961.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059961.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059969.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059969.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059969.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059973.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059973.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059973.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059977.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059977.png new file mode 100644 index 0000000..88c4617 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059977.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059981.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059981.jpg new file mode 100644 index 0000000..580705c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059981.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059985.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059985.png new file mode 100644 index 0000000..e92c89d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059985.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059997.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059997.jpg new file mode 100644 index 0000000..ce88e71 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059997.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139389.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139389.png new file mode 100644 index 0000000..a9fb4cf Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139389.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139405.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139405.png new file mode 100644 index 0000000..3baf4af Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139405.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139413.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139413.png new file mode 100644 index 0000000..c84e94e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139413.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139417.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139417.png new file mode 100644 index 0000000..9632c84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139417.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139421.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139421.png new file mode 100644 index 0000000..d4e4ed2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139421.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139425.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139425.png new file mode 100644 index 0000000..89000c9 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139425.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139437.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139437.png new file mode 100644 index 0000000..0bc541a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139437.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139449.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139449.png new file mode 100644 index 0000000..17a060e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139449.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139461.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139461.png new file mode 100644 index 0000000..5a9fea7 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139461.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139497.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139497.png new file mode 100644 index 0000000..ee4ef17 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139497.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139501.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139501.png new file mode 100644 index 0000000..ce2ad69 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139501.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139513.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139513.png new file mode 100644 index 0000000..4273e56 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139513.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139529.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139529.png new file mode 100644 index 0000000..b5f56eb Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139529.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139545.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139545.jpg new file mode 100644 index 0000000..0fd0d3f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139545.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139549.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139549.png new file mode 100644 index 0000000..824e67d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139549.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139553.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139553.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139553.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139561.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139561.jpg new file mode 100644 index 0000000..d3b2d60 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139561.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139569.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139569.png new file mode 100644 index 0000000..4aa6464 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139569.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139581.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139581.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139581.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139601.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139601.png new file mode 100644 index 0000000..233cd56 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139601.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139609.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139609.png new file mode 100644 index 0000000..fe0c828 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139609.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139617.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139617.png new file mode 100644 index 0000000..3baf4af Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139617.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139645.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139645.png new file mode 100644 index 0000000..ce2ad69 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139645.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139657.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139657.png new file mode 100644 index 0000000..f6a3359 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139657.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139661.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139661.png new file mode 100644 index 0000000..18c36c8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139661.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139673.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139673.jpg new file mode 100644 index 0000000..fa223fb Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139673.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139677.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139677.png new file mode 100644 index 0000000..f22baec Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139677.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139685.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139685.png new file mode 100644 index 0000000..799c857 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139685.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139689.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139689.png new file mode 100644 index 0000000..26fde0f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139689.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139717.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139717.png new file mode 100644 index 0000000..073f84c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139717.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139725.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139725.jpg new file mode 100644 index 0000000..d81b6b4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139725.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139729.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139729.png new file mode 100644 index 0000000..cf00888 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139729.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139733.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139733.png new file mode 100644 index 0000000..9ad0bc6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139733.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139741.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139741.png new file mode 100644 index 0000000..3ae6853 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139741.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139745.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139745.png new file mode 100644 index 0000000..79c99ed Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139745.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139753.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139753.jpg new file mode 100644 index 0000000..c80477c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139753.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139761.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139761.png new file mode 100644 index 0000000..e5f282b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139761.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139773.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139773.png new file mode 100644 index 0000000..7673582 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139773.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139781.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139781.jpg new file mode 100644 index 0000000..43c85cf Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139781.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139801.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139801.png new file mode 100644 index 0000000..c9c769b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139801.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139809.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139809.jpg new file mode 100644 index 0000000..370fdf4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139809.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139813.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139813.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139813.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139821.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139821.png new file mode 100644 index 0000000..18bd857 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139821.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139825.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139825.jpg new file mode 100644 index 0000000..9c09f30 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139825.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139833.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139833.png new file mode 100644 index 0000000..5602e1a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139833.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139837.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139837.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139837.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139841.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139841.jpg new file mode 100644 index 0000000..269c0ee Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139841.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139845.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139845.png new file mode 100644 index 0000000..e433406 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139845.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139849.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139849.png new file mode 100644 index 0000000..e433406 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139849.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139853.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139853.png new file mode 100644 index 0000000..6622de1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139853.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139861.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139861.png new file mode 100644 index 0000000..5816e78 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139861.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258973.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258973.jpg new file mode 100644 index 0000000..10c8fff Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258973.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258977.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258977.png new file mode 100644 index 0000000..9c631cb Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258977.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258993.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258993.png new file mode 100644 index 0000000..07374dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258993.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258997.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258997.jpg new file mode 100644 index 0000000..4f2665c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349258997.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259001.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259001.png new file mode 100644 index 0000000..6a06cad Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259001.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259005.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259005.png new file mode 100644 index 0000000..934505b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259005.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259013.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259013.jpg new file mode 100644 index 0000000..4f3a02a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259013.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259025.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259025.png new file mode 100644 index 0000000..2f0ab66 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259025.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259033.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259033.png new file mode 100644 index 0000000..07374dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259033.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259037.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259037.png new file mode 100644 index 0000000..f98d63a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259037.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259041.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259041.png new file mode 100644 index 0000000..07374dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259041.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259045.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259045.png new file mode 100644 index 0000000..07374dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259045.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259089.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259089.png new file mode 100644 index 0000000..ef758e2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259089.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259093.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259093.jpg new file mode 100644 index 0000000..76439ca Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259093.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259097.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259097.jpg new file mode 100644 index 0000000..21d0411 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259097.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259105.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259105.png new file mode 100644 index 0000000..e433406 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259105.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259113.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259113.png new file mode 100644 index 0000000..848b318 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259113.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259117.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259117.jpg new file mode 100644 index 0000000..9a47c32 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259117.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259145.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259145.png new file mode 100644 index 0000000..a422c9a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259145.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259161.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259161.png new file mode 100644 index 0000000..1772974 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259161.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259169.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259169.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259169.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259193.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259193.png new file mode 100644 index 0000000..8f7fb24 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259193.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259197.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259197.png new file mode 100644 index 0000000..88d40c0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259197.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259201.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259201.png new file mode 100644 index 0000000..a7f7197 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259201.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259205.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259205.png new file mode 100644 index 0000000..08adb71 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259205.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259209.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259209.jpg new file mode 100644 index 0000000..b5427d1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259209.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259213.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259213.jpg new file mode 100644 index 0000000..4f2665c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259213.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259217.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259217.jpg new file mode 100644 index 0000000..1b7bea9 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259217.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259221.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259221.png new file mode 100644 index 0000000..8c1d5ae Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259221.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259225.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259225.png new file mode 100644 index 0000000..c68db1b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259225.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259233.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259233.png new file mode 100644 index 0000000..e2407bc Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259233.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259241.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259241.png new file mode 100644 index 0000000..e8313fc Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259241.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259249.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259249.jpg new file mode 100644 index 0000000..8553811 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259249.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259265.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259265.png new file mode 100644 index 0000000..e3d2e0e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259265.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259269.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259269.png new file mode 100644 index 0000000..9ccebe3 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259269.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259273.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259273.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259273.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259277.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259277.jpg new file mode 100644 index 0000000..61a9c4c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259277.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259305.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259305.png new file mode 100644 index 0000000..807d780 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259305.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259309.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259309.png new file mode 100644 index 0000000..25ded40 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259309.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259313.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259313.png new file mode 100644 index 0000000..4467d4b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259313.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259317.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259317.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259317.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259321.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259321.jpg new file mode 100644 index 0000000..cbd9123 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259321.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259325.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259325.png new file mode 100644 index 0000000..66cf313 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259325.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259329.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259329.png new file mode 100644 index 0000000..7b28912 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259329.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259357.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259357.jpg new file mode 100644 index 0000000..ab5f657 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259357.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259365.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259365.png new file mode 100644 index 0000000..5816e78 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259365.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259381.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259381.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259381.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259393.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259393.png new file mode 100644 index 0000000..7f3e511 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259393.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259401.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259401.png new file mode 100644 index 0000000..07374dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259401.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259405.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259405.png new file mode 100644 index 0000000..c4c142d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259405.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259409.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259409.png new file mode 100644 index 0000000..e433406 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259409.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259417.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259417.png new file mode 100644 index 0000000..e433406 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259417.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259421.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259421.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259421.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259425.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259425.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259425.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259429.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259429.png new file mode 100644 index 0000000..d40da22 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259429.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001387892350.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001387892350.png new file mode 100644 index 0000000..17f7545 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001387892350.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001387905484.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001387905484.png new file mode 100644 index 0000000..72f5360 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001387905484.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388071772.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388071772.png new file mode 100644 index 0000000..97e60dd Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388071772.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388325592.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388325592.png new file mode 100644 index 0000000..bcf2510 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388325592.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388357306.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388357306.png new file mode 100644 index 0000000..1cca7a4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388357306.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388372074.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388372074.png new file mode 100644 index 0000000..1b63bad Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001388372074.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389147806.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389147806.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389147806.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389147810.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389147810.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389147810.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389152774.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389152774.png new file mode 100644 index 0000000..4a9dac9 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389152774.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389307342.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389307342.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389307342.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389312782.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389312782.png new file mode 100644 index 0000000..91c2daf Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389312782.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389466918.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389466918.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389466918.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389506524.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389506524.png new file mode 100644 index 0000000..abf560a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389506524.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389626890.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389626890.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389626890.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389632602.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389632602.png new file mode 100644 index 0000000..0f81161 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389632602.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389636106.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389636106.png new file mode 100644 index 0000000..6183a6d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001389636106.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001391556190.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001391556190.png new file mode 100644 index 0000000..871ef6a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001391556190.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438241209.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438241209.png new file mode 100644 index 0000000..093545d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438241209.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438277693.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438277693.png new file mode 100644 index 0000000..f0beb4c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438277693.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438420609.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438420609.png new file mode 100644 index 0000000..bcb0d87 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438420609.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438431645.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438431645.png new file mode 100644 index 0000000..1a7780b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438431645.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438532613.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438532613.png new file mode 100644 index 0000000..5931bde Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438532613.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438640405.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438640405.png new file mode 100644 index 0000000..6907eb4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438640405.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439347017.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439347017.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439347017.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439626593.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439626593.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439626593.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439746629.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439746629.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439746629.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439746633.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439746633.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439746633.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439786829.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439786829.png new file mode 100644 index 0000000..f8c4ca6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001439786829.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001440850393.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001440850393.png new file mode 100644 index 0000000..5870f42 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001440850393.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001440970317.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001440970317.png new file mode 100644 index 0000000..51558c1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001440970317.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441091233.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441091233.png new file mode 100644 index 0000000..7af9c59 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441091233.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441092221.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441092221.png new file mode 100644 index 0000000..c5cada3 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441092221.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441208981.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441208981.png new file mode 100644 index 0000000..62de180 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441208981.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441209301.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441209301.png new file mode 100644 index 0000000..e2395ad Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441209301.png differ diff --git a/doc/component-operation-guide-lts/source/appendix/accessing_fusioninsight_manager.rst b/doc/component-operation-guide-lts/source/appendix/accessing_fusioninsight_manager.rst new file mode 100644 index 0000000..0a7ed23 --- /dev/null +++ b/doc/component-operation-guide-lts/source/appendix/accessing_fusioninsight_manager.rst @@ -0,0 +1,132 @@ +:original_name: mrs_01_2124.html + +.. _mrs_01_2124: + +Accessing FusionInsight Manager +=============================== + +Scenario +-------- + +FusionInsight Manager is used to monitor, configure, and manage clusters. After the cluster is installed, you can use the account to log in to FusionInsight Manager. + +Currently, you can access FusionInsight Manager using the following methods: + +- :ref:`Accessing FusionInsight Manager Using EIP ` +- :ref:`Accessing FusionInsight Manager Using Direct Connect ` +- :ref:`Accessing FusionInsight Manager from an ECS ` + +You can switch the access methods between **EIP** and **Direct Connect** on the MRS console by performing the following steps: + +Log in to the MRS console and click the target cluster to go its details page. In the **Basic Information** area, click |image1| on the right of **Access Manager**. + +.. note:: + + If you cannot log in to the WebUI of the component, access FusionInsight Manager by referring to :ref:`Accessing FusionInsight Manager from an ECS `. + +.. _mrs_01_2124__en-us_topic_0000001219029417_en-us_topic_0000001173471140_section17594144112487: + +Accessing FusionInsight Manager Using EIP +----------------------------------------- + +#. Log in to the MRS management console. + +#. In the navigation pane, choose **Clusters** > **Active Clusters**. Click the target cluster name to access the cluster details page. + +#. Click **Manager** next to **MRS Manager**. In the displayed dialog box, configure the EIP information. + + a. If no EIP is bound during MRS cluster creation, select an available EIP from the EIP drop-down list or click **Manage EIP** to create an EIP. If you have bound an EIP when creating a cluster, go to :ref:`3.b `. + + b. .. _mrs_01_2124__en-us_topic_0000001219029417_en-us_topic_0000001173471140_li59591846143810: + + Select the security group to which the security group rule to be added belongs. The security group is configured when the cluster is created. + + c. Add a security group rule. By default, the filled-in rule is used to access the EIP. To enable multiple IP address segments to access Manager, see steps :ref:`6 ` to :ref:`9 `. If you want to view, modify, or delete a security group rule, click **Manage Security Group Rule**. + + d. Select the information to be confirmed and click **OK**. + +#. Click **OK**. The Manager login page is displayed. + +#. Enter the default username **admin** and the password set during cluster creation, and click **Log In**. The Manager page is displayed. + +#. .. _mrs_01_2124__en-us_topic_0000001219029417_en-us_topic_0000001173471140_en-us_topic_0035209594_li1049410469610: + + On the MRS management console, choose **Clusters** > **Active Clusters**. Click the target cluster name to access the cluster details page. + + .. note:: + + To grant other users the permission to access Manager, perform :ref:`6 ` to :ref:`9 ` to add the users' public IP addresses to the trusted IP address range. + +#. Click **Add Security Group Rule** on the right of **EIP**. + +#. On the **Add Security Group Rule** page, add the IP address segment for users to access the public network and select **I confirm that the authorized object is a trusted public IP address range. Do not use 0.0.0.0/0. Otherwise, security risks may arise.** + + By default, the IP address used for accessing the public network is filled. You can change the IP address segment as required. To enable multiple IP address segments, repeat steps :ref:`6 ` to :ref:`9 `. If you want to view, modify, or delete a security group rule, click **Manage Security Group Rule**. + +#. .. _mrs_01_2124__en-us_topic_0000001219029417_en-us_topic_0000001173471140_en-us_topic_0035209594_li035723593115: + + Click **OK**. + +.. _mrs_01_2124__en-us_topic_0000001219029417_en-us_topic_0000001173471140_section12101164416719: + +Accessing FusionInsight Manager by Using Direct Connect +------------------------------------------------------- + +When **Direct Connect** is enabled in the environment, you can access FusionInsight Manager by using Direct Connect as an system administrator. + +#. Log in to the MRS management console. +#. In the navigation pane, choose **Clusters** > **Active Clusters**. Click the target cluster name to access the cluster details page. +#. Click **Access Manager** next to **MRS Manager**. In the displayed dialog box, select **Direct Connect**. +#. Click **OK**. The Manager login page is displayed. +#. Enter the default username **admin** and the password set during cluster creation, and click **Log In**. The Manager page is displayed. + +.. _mrs_01_2124__en-us_topic_0000001219029417_en-us_topic_0000001173471140_section20880102283115: + +Accessing FusionInsight Manager from an ECS +------------------------------------------- + +#. On the MRS management console, click **Clusters**. + +#. On the **Active Clusters** page, click the name of the specified cluster. + + Record the **AZ**, **VPC**, **MRS Manager**\ **Security Group** of the cluster. + +#. On the homepage of the management console, choose **Service List** > **Elastic Cloud Server** to switch to the ECS management console and create an ECS. + + - The **AZ**, **VPC**, and **Security Group** of the ECS must be the same as those of the cluster to be accessed. + - Select a Windows public image. For example, a standard image **Windows Server 2012 R2 Standard 64bit(40GB)**. + - For details about other configuration parameters, see **Elastic Cloud Server > User Guide > Getting Started > Creating and Logging In to a Windows ECS**. + + .. note:: + + If the security group of the ECS is different from **Default Security Group** of the Master node, you can modify the configuration using either of the following methods: + + - Change the security group of the ECS to the default security group of the Master node. For details, see **Elastic Cloud Server** > **User Guide** > **Security Group** > **Changing a Security Group**. + - Add two security group rules to the security groups of the Master and Core nodes to enable the ECS to access the cluster. Set **Protocol** to **TCP**, **Ports** of the two security group rules to **28443** and **20009**, respectively. For details, see **Virtual Private Cloud > User Guide > Security > Security Group > Adding a Security Group Rule**. + +#. On the VPC management console, apply for an EIP and bind it to the ECS. + + For details, see **Virtual Private Cloud** > **User Guide** > **Elastic IP** > **Assigning an EIP and Binding It to an ECS**. + +#. Log in to the ECS. + + The Windows system account, password, EIP, and the security group rules are required for logging in to the ECS. For details, see **Elastic Cloud Server > User Guide > Instances > Logging In to a Windows ECS**. + +#. On the Windows remote desktop, use your browser to access Manager. + + For example, you can use Internet Explorer 11 in the Windows 2012 OS. + + The address for accessing Manager is the address of the MRS Manager page. Enter the name and password of the cluster user, for example, user **admin**. + + |image2| + + .. note:: + + - If you access Manager with other cluster usernames, change the password upon your first access. The new password must meet the requirements of the current password complexity policies. For details, contact the system administrator. + - By default, a user is locked after inputting an incorrect password five consecutive times. The user is automatically unlocked after 5 minutes. + +#. Log out of FusionInsight Manager. To log out of Manager, move the cursor to |image3| in the upper right corner and click **Log Out**. + +.. |image1| image:: /_static/images/en-us_image_0000001295900152.png +.. |image2| image:: /_static/images/en-us_image_0000001388357306.png +.. |image3| image:: /_static/images/en-us_image_0000001438277693.png diff --git a/doc/component-operation-guide-lts/source/appendix/index.rst b/doc/component-operation-guide-lts/source/appendix/index.rst new file mode 100644 index 0000000..68db001 --- /dev/null +++ b/doc/component-operation-guide-lts/source/appendix/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_2122.html + +.. _mrs_01_2122: + +Appendix +======== + +- :ref:`Modifying Cluster Service Configuration Parameters ` +- :ref:`Accessing FusionInsight Manager ` +- :ref:`Using an MRS Client ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + modifying_cluster_service_configuration_parameters + accessing_fusioninsight_manager + using_an_mrs_client/index diff --git a/doc/component-operation-guide-lts/source/appendix/modifying_cluster_service_configuration_parameters.rst b/doc/component-operation-guide-lts/source/appendix/modifying_cluster_service_configuration_parameters.rst new file mode 100644 index 0000000..8662889 --- /dev/null +++ b/doc/component-operation-guide-lts/source/appendix/modifying_cluster_service_configuration_parameters.rst @@ -0,0 +1,50 @@ +:original_name: mrs_01_1293.html + +.. _mrs_01_1293: + +Modifying Cluster Service Configuration Parameters +================================================== + +Modify the configuration parameters of each service on FusionInsight Manager. + +#. You have logged in to FusionInsight Manager. + +#. Choose **Cluster** > *Name of the desired cluster* > **Service**. + +#. Click the specified service name on the service management page. + +#. Click **Configuration**. + + The **Basic Configuration** tab page is displayed by default. To modify more parameters, click the **All Configurations** tab. The navigation tree displays all configuration parameters of the service. The level-1 nodes in the navigation tree are service names or role names. The parameter category is displayed after the level-1 node is expanded. + +#. In the navigation tree, select the specified parameter category and change the parameter values on the right. + + If you are not sure about the location of a parameter, you can enter the parameter name in search box in the upper right corner. The system searches for the parameter in real time and displays the result. + +#. Click **Save**. In the confirmation dialog box, click **OK**. + +#. Wait until the message **Operation successful** is displayed. Click **Finish**. + + The configuration is modified. + + Check whether there is any service whose configuration has expired in the cluster. If yes, restart the corresponding service or role instance for the configuration to take effect. + +Modify the configuration parameters of each service on the cluster management page of the MRS management console. + +#. Log in to the MRS console. In the left navigation pane, choose **Clusters** > **Active Clusters**, and click a cluster name. + +#. Choose **Components** > *Name of the desired service* > **Service Configuration**. + + The **Basic Configuration** tab page is displayed by default. To modify more parameters, click the **All Configurations** tab. The navigation tree displays all configuration parameters of the service. The level-1 nodes in the navigation tree are service names or role names. The parameter category is displayed after the level-1 node is expanded. + +#. In the navigation tree, select the specified parameter category and change the parameter values on the right. + + If you are not sure about the location of a parameter, you can enter the parameter name in search box in the upper right corner. The system searches for the parameter in real time and displays the result. + +#. Click **Save Configuration**. In the displayed dialog box, click **OK**. + +#. Wait until the message **Operation successful** is displayed. Click **Finish**. + + The configuration is modified. + + Check whether there is any service whose configuration has expired in the cluster. If yes, restart the corresponding service or role instance for the configuration to take effect. diff --git a/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/index.rst b/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/index.rst new file mode 100644 index 0000000..95fbf45 --- /dev/null +++ b/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_0787.html + +.. _mrs_01_0787: + +Using an MRS Client +=================== + +- :ref:`Using an MRS Client on Nodes Inside a MRS Cluster ` +- :ref:`Using an MRS Client on Nodes Outside a MRS Cluster ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + using_an_mrs_client_on_nodes_inside_a_mrs_cluster + using_an_mrs_client_on_nodes_outside_a_mrs_cluster diff --git a/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/using_an_mrs_client_on_nodes_inside_a_mrs_cluster.rst b/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/using_an_mrs_client_on_nodes_inside_a_mrs_cluster.rst new file mode 100644 index 0000000..ccecff8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/using_an_mrs_client_on_nodes_inside_a_mrs_cluster.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_0788.html + +.. _mrs_01_0788: + +Using an MRS Client on Nodes Inside a MRS Cluster +================================================= + +Scenario +-------- + +Before using the client, you need to install the client. For example, the installation directory is **/opt/hadoopclient**. + +Procedure +--------- + +#. Log in to the Master node in the cluster as user **root**. + +#. Run the **sudo su -** **omm** command to switch the current user. + +#. Run the following command to go to the client directory: + + **cd /opt/hadoopclient** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If the Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step: + + **kinit** *MRS cluster user* + + Example: **kinit admin** + + .. note:: + + User **admin** is created by default for MRS clusters with Kerberos authentication enabled and is used for administrators to maintain the clusters. + +#. Run the client command of a component directly. + + For example, run the **hdfs dfs -ls /** command to view files in the HDFS root directory. diff --git a/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/using_an_mrs_client_on_nodes_outside_a_mrs_cluster.rst b/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/using_an_mrs_client_on_nodes_outside_a_mrs_cluster.rst new file mode 100644 index 0000000..62f19d1 --- /dev/null +++ b/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/using_an_mrs_client_on_nodes_outside_a_mrs_cluster.rst @@ -0,0 +1,117 @@ +:original_name: mrs_01_0800.html + +.. _mrs_01_0800: + +Using an MRS Client on Nodes Outside a MRS Cluster +================================================== + +Scenario +-------- + +After a client is installed, you can use the client on a node outside an MRS cluster. + +Prerequisites +------------- + +- A Linux ECS has been prepared. For details about the OS and its version of the ECS, see :ref:`Table 1 `. + + .. _mrs_01_0800__en-us_topic_0000001237758831_en-us_topic_0270713152_en-us_topic_0264269418_table40818788104630: + + .. table:: **Table 1** Reference list + + +-------------------------+-----------------------+-------------------------------------------------+ + | CPU Architecture | Operating System (OS) | Supported Version | + +=========================+=======================+=================================================+ + | x86 computing | Euler | Euler OS 2.5 | + +-------------------------+-----------------------+-------------------------------------------------+ + | | SLES | SUSE Linux Enterprise Server 12 SP4 (SUSE 12.4) | + +-------------------------+-----------------------+-------------------------------------------------+ + | | RedHat | Red Hat-7.5-x86_64 (Red Hat 7.5) | + +-------------------------+-----------------------+-------------------------------------------------+ + | | CentOS | CentOS-7.6 | + +-------------------------+-----------------------+-------------------------------------------------+ + | Kunpeng computing (Arm) | Euler | Euler OS 2.8 | + +-------------------------+-----------------------+-------------------------------------------------+ + | | CentOS | CentOS-7.6 | + +-------------------------+-----------------------+-------------------------------------------------+ + + In addition, sufficient disk space is allocated for the ECS, for example, **40GB**. + +- The ECS and the MRS cluster are in the same VPC. + +- The security group of the ECS must be the same as that of the Master node in an MRS cluster. + +- The NTP service has been installed on the ECS OS and is running properly. + + If the NTP service is not installed, run the **yum install ntp -y** command to install it when the **yum** source is configured. + +- A user can log in to the Linux ECS using the password (in SSH mode). + +Procedure +--------- + +#. Create an ECS that meets the requirements. +#. Perform NTP time synchronization to synchronize the time of nodes outside the cluster with the time of the MRS cluster. + + a. Run the **vi /etc/ntp.conf** command to edit the NTP client configuration file, add the IP address of the Master node in the MRS cluster, and comment out the IP addresses of other servers. + + .. code-block:: + + server master1_ip profer + server master2_ip + + + .. figure:: /_static/images/en-us_image_0000001389636106.png + :alt: **Figure 1** Adding the Master node IP addresses + + **Figure 1** Adding the Master node IP addresses + + b. Run the **service ntpd stop** command to stop the NTP service. + + c. Run the **/usr/sbin/ntpdate** *IP address of the active Master node* command to manually synchronize the time. + + d. Run the **service ntpd start** or **systemctl restart ntpd** command to start the NTP service. + + e. Run the **ntpstat** command to check the time synchronization result: + +#. Perform the following steps to download the cluster client software package from FusionInsight Manager, copy the package to the ECS node, and install the client: + + a. Log in to FusionInsight Manager and download the cluster client to the specified directory on the active management node. + + b. Log in to the active management node as user **root**. + + **sudo su - omm** + + c. Run the following command to copy the client to the node where the client is to be installed: + + **scp -p /tmp/FusionInsight-Client/FusionInsight_Cluster_1_Services_Client.tar** *IP address of the node where the client is to be installed*\ **:/tmp** + + d. Log in to a node on which the client is to be installed as the client user. + + Run the following command to install the client. If you do not have the file operation permission, change the file permission as user **root**. + + **cd /tmp** + + **tar -xvf** **FusionInsight_Cluster_1_Services_Client.tar** + + **tar -xvf** **FusionInsight_Cluster_1_Services_ClientConfig.tar** + + **cd /tmp/FusionInsight\_Cluster_1_Services_ClientConfig** + + **./install.sh /opt/mrsclient** + + e. Run the following commands to switch to the client directory and configure environment variables: + + **cd /opt/mrsclient** + + **source bigdata_env** + + f. If the Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step: + + **kinit** *MRS cluster user* + + Example: **kinit admin** + + g. Run the client command of a component directly. + + For example, run the **hdfs dfs -ls /** command to view files in the HDFS root directory. diff --git a/doc/component-operation-guide-lts/source/change_history.rst b/doc/component-operation-guide-lts/source/change_history.rst new file mode 100644 index 0000000..e21eef5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/change_history.rst @@ -0,0 +1,12 @@ +:original_name: en-us_topic_0000001298722056.html + +.. _en-us_topic_0000001298722056: + +Change History +============== + +=========== ========================================= +Released On What's New +=========== ========================================= +2022-11-01 This issue is the first official release. +=========== ========================================= diff --git a/doc/component-operation-guide-lts/source/index.rst b/doc/component-operation-guide-lts/source/index.rst index b24c2c5..2ad8384 100644 --- a/doc/component-operation-guide-lts/source/index.rst +++ b/doc/component-operation-guide-lts/source/index.rst @@ -2,3 +2,28 @@ Map Reduce Service - Component Operation Guide (LTS) ==================================================== +.. toctree:: + :maxdepth: 1 + + using_carbondata/index + using_clickhouse/index + using_dbservice/index + using_flink/index + using_flume/index + using_hbase/index + using_hdfs/index + using_hetuengine/index + using_hive/index + using_hudi/index + using_hue/index + using_kafka/index + using_loader/index + using_mapreduce/index + using_oozie/index + using_ranger/index + using_spark2x/index + using_tez/index + using_yarn/index + using_zookeeper/index + appendix/index + change_history diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_access_control.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_access_control.rst new file mode 100644 index 0000000..b16fe75 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_access_control.rst @@ -0,0 +1,98 @@ +:original_name: mrs_01_1422.html + +.. _mrs_01_1422: + +CarbonData Access Control +========================= + +The following table provides details about Hive ACL permissions required for performing operations on CarbonData tables. + +Prerequisites +------------- + +Parameters listed in :ref:`Table 5 ` or :ref:`Table 6 ` have been configured. + +Hive ACL permissions +-------------------- + +.. table:: **Table 1** Hive ACL permissions required for CarbonData table-level operations + + +--------------------------------------+---------------------------------------------------------------------------------+ + | Scenario | Required Permission | + +======================================+=================================================================================+ + | DESCRIBE TABLE | SELECT (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | SELECT | SELECT (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | EXPLAIN | SELECT (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | CREATE TABLE | CREATE (of database) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | CREATE TABLE As SELECT | CREATE (on database), INSERT (on table), RW on data file, and SELECT (on table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | LOAD | INSERT (of table) RW on data file | + +--------------------------------------+---------------------------------------------------------------------------------+ + | DROP TABLE | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | DELETE SEGMENTS | DELETE (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | SHOW SEGMENTS | SELECT (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | CLEAN FILES | DELETE (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | INSERT OVERWRITE / INSERT INTO | INSERT (of table) RW on data file and SELECT (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | CREATE INDEX | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | DROP INDEX | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | SHOW INDEXES | SELECT (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | ALTER TABLE ADD COLUMN | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | ALTER TABLE DROP COLUMN | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | ALTER TABLE CHANGE DATATYPE | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | ALTER TABLE RENAME | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | ALTER TABLE COMPACTION | INSERT (on table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | FINISH STREAMING | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | ALTER TABLE SET STREAMING PROPERTIES | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | ALTER TABLE SET TABLE PROPERTIES | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | UPDATE CARBON TABLE | UPDATE (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | DELETE RECORDS | DELETE (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | REFRESH TABLE | OWNER (of main table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | REGISTER INDEX TABLE | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | SHOW PARTITIONS | SELECT (on table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | ALTER TABLE ADD PARTITION | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + | ALTER TABLE DROP PARTITION | OWNER (of table) | + +--------------------------------------+---------------------------------------------------------------------------------+ + +.. note:: + + - If tables in the database are created by multiple users, the **Drop database** command fails to be executed even if the user who runs the command is the owner of the database. + + - In a secondary index, when the parent table is triggered, **insert** and **compaction** are triggered on the index table. If you select a query that has a filter condition that matches index table columns, you should provide selection permissions for the parent table and index table. + + - The LockFiles folder and lock files created in the LockFiles folder will have full permissions, as the LockFiles folder does not contain any sensitive data. + + - If you are using ACL, ensure you do not configure any path for DDL or DML which is being used by other process. You are advised to create new paths. + + Configure the path for the following configuration items: + + 1) carbon.badRecords.location + + 2) Db_Path and other items during database creation + + - For Carbon ACL in a non-security cluster, **hive.server2.enable.doAs** in the **hive-site.xml** file must be set to **false**. Then the query will run as the user who runs the hiveserver2 process. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_do_i_configure_unsafe_memory_in_carbondata.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_do_i_configure_unsafe_memory_in_carbondata.rst new file mode 100644 index 0000000..9048bd6 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_do_i_configure_unsafe_memory_in_carbondata.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_1472.html + +.. _mrs_01_1472: + +How Do I Configure Unsafe Memory in CarbonData? +=============================================== + +Question +-------- + +How do I configure unsafe memory in CarbonData? + +Answer +------ + +In the Spark configuration, the value of **spark.yarn.executor.memoryOverhead** must be greater than the sum of (**sort.inmemory.size.inmb** + **Netty offheapmemory required**), or the sum of (**carbon.unsafe.working.memory.in.mb** + **carbon.sort.inememory.storage.size.in.mb** + **Netty offheapmemory required**). Otherwise, if off-heap access exceeds the configured executor memory, Yarn may stop the executor. + +If **spark.shuffle.io.preferDirectBufs** is set to **true**, the netty transfer service in Spark takes off some heap memory (around 384 MB or 0.1 x executor memory) from **spark.yarn.executor.memoryOverhead**. + +For details, see :ref:`Configuring Executor Off-Heap Memory `. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_do_i_logically_split_data_across_different_namespaces.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_do_i_logically_split_data_across_different_namespaces.rst new file mode 100644 index 0000000..dad836f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_do_i_logically_split_data_across_different_namespaces.rst @@ -0,0 +1,67 @@ +:original_name: mrs_01_1469.html + +.. _mrs_01_1469: + +How Do I Logically Split Data Across Different Namespaces? +========================================================== + +Question +-------- + +How do I logically split data across different namespaces? + +Answer +------ + +- Configuration: + + To logically split data across different namespaces, you must update the following configuration in the **core-site.xml** file of HDFS, Hive, and Spark. + + .. note:: + + Changing the Hive component will change the locations of carbonstore and warehouse. + + - Configuration in HDFS + + - **fs.defaultFS**: Name of the default file system. The URI mode must be set to **viewfs**. When **viewfs** is used, the permission part must be **ClusterX**. + - **fs.viewfs.mountable.ClusterX.homedir**: Home directory base path. You can use the getHomeDirectory() method defined in **FileSystem/FileContext** to access the home directory. + - fs.viewfs.mountable.default.link.: ViewFS mount table. + + Example: + + .. code-block:: + + + fs.defaultFS + viewfs://ClusterX/ + + + fs.viewfs.mounttable.ClusterX.link./folder1 + hdfs://NS1/folder1 + + + fs.viewfs.mounttable.ClusterX.link./folder2 + hdfs://NS2/folder2 + + + - Configurations in Hive and Spark + + **fs.defaultFS**: Name of the default file system. The URI mode must be set to **viewfs**. When **viewfs** is used, the permission part must be **ClusterX**. + +- Syntax: + + **LOAD DATA INPATH** *'path to data' INTO TABLE table_name OPTIONS ``('...');``* + + .. note:: + + When Spark is configured with the viewFS file system and attempts to load data from HDFS, users must specify a path such as **viewfs://** or a relative path as the file path in the **LOAD** statement. + +- Example: + + - Sample viewFS path: + + **LOAD DATA INPATH** *'viewfs://ClusterX/dir/data.csv' INTO TABLE table_name OPTIONS ``('...');``* + + - Sample relative path: + + **LOAD DATA INPATH** *'/apps/input_data1.txt'* **INTO TABLE** *table_name*; diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_to_avoid_minor_compaction_for_historical_data.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_to_avoid_minor_compaction_for_historical_data.rst new file mode 100644 index 0000000..69ff941 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_to_avoid_minor_compaction_for_historical_data.rst @@ -0,0 +1,31 @@ +:original_name: mrs_01_1459.html + +.. _mrs_01_1459: + +How to Avoid Minor Compaction for Historical Data? +================================================== + +Question +-------- + +How to avoid minor compaction for historical data? + +Answer +------ + +If you want to load historical data first and then the incremental data, perform following steps to avoid minor compaction of historical data: + +#. Load all historical data. +#. Configure the major compaction size to a value smaller than the segment size of historical data. +#. Run the major compaction once on historical data so that these segments will not be considered later for minor compaction. +#. Load the incremental data. +#. You can configure the minor compaction threshold as required. + +For example: + +#. Assume that you have loaded all historical data to CarbonData and the size of each segment is 500 GB. +#. Set the threshold of major compaction property to **carbon.major.compaction.size** = **491520** (480 GB x 1024). +#. Run major compaction. All segments will be compacted because the size of each segment is more than configured size. +#. Perform incremental loading. +#. Configure the minor compaction threshold to **carbon.compaction.level.threshold** = **6,6**. +#. Run minor compaction. As a result, only incremental data is compacted. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_to_change_the_default_group_name_for_carbondata_data_loading.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_to_change_the_default_group_name_for_carbondata_data_loading.rst new file mode 100644 index 0000000..431f7e8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/how_to_change_the_default_group_name_for_carbondata_data_loading.rst @@ -0,0 +1,19 @@ +:original_name: mrs_01_1460.html + +.. _mrs_01_1460: + +How to Change the Default Group Name for CarbonData Data Loading? +================================================================= + +Question +-------- + +How to change the default group name for CarbonData data loading? + +Answer +------ + +By default, the group name for CarbonData data loading is **ficommon**. You can perform the following operation to change the default group name: + +#. Edit the **carbon.properties** file. +#. Change the value of the key **carbon.dataload.group.name** as required. The default value is **ficommon**. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/index.rst new file mode 100644 index 0000000..51418e8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/index.rst @@ -0,0 +1,46 @@ +:original_name: mrs_01_1457.html + +.. _mrs_01_1457: + +CarbonData FAQ +============== + +- :ref:`Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values? ` +- :ref:`How to Avoid Minor Compaction for Historical Data? ` +- :ref:`How to Change the Default Group Name for CarbonData Data Loading? ` +- :ref:`Why Does INSERT INTO CARBON TABLE Command Fail? ` +- :ref:`Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters? ` +- :ref:`Why Data Load Performance Decreases due to Bad Records? ` +- :ref:`Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial Executors Is Zero? ` +- :ref:`Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed? ` +- :ref:`Why Data loading Fails During off heap? ` +- :ref:`Why Do I Fail to Create a Hive Table? ` +- :ref:`Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner? ` +- :ref:`How Do I Logically Split Data Across Different Namespaces? ` +- :ref:`Why Missing Privileges Exception is Reported When I Perform Drop Operation on Databases? ` +- :ref:`Why the UPDATE Command Cannot Be Executed in Spark Shell? ` +- :ref:`How Do I Configure Unsafe Memory in CarbonData? ` +- :ref:`Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS? ` +- :ref:`Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed? ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + why_is_incorrect_output_displayed_when_i_perform_query_with_filter_on_decimal_data_type_values + how_to_avoid_minor_compaction_for_historical_data + how_to_change_the_default_group_name_for_carbondata_data_loading + why_does_insert_into_carbon_table_command_fail + why_is_the_data_logged_in_bad_records_different_from_the_original_input_data_with_escape_characters + why_data_load_performance_decreases_due_to_bad_records + why_insert_into_load_data_task_distribution_is_incorrect_and_the_opened_tasks_are_less_than_the_available_executors_when_the_number_of_initial_executors_is_zero + why_does_carbondata_require_additional_executors_even_though_the_parallelism_is_greater_than_the_number_of_blocks_to_be_processed + why_data_loading_fails_during_off_heap + why_do_i_fail_to_create_a_hive_table + why_carbondata_tables_created_in_v100r002c50rc1_not_reflecting_the_privileges_provided_in_hive_privileges_for_non-owner + how_do_i_logically_split_data_across_different_namespaces + why_missing_privileges_exception_is_reported_when_i_perform_drop_operation_on_databases + why_the_update_command_cannot_be_executed_in_spark_shell + how_do_i_configure_unsafe_memory_in_carbondata + why_exception_occurs_in_carbondata_when_disk_space_quota_is_set_for_storage_directory_in_hdfs + why_does_data_query_or_loading_fail_and_org.apache.carbondata.core.memory.memoryexception_not_enough_memory_is_displayed diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_carbondata_tables_created_in_v100r002c50rc1_not_reflecting_the_privileges_provided_in_hive_privileges_for_non-owner.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_carbondata_tables_created_in_v100r002c50rc1_not_reflecting_the_privileges_provided_in_hive_privileges_for_non-owner.rst new file mode 100644 index 0000000..516cbc0 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_carbondata_tables_created_in_v100r002c50rc1_not_reflecting_the_privileges_provided_in_hive_privileges_for_non-owner.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_1468.html + +.. _mrs_01_1468: + +Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner? +======================================================================================================================== + +Question +-------- + +Why CarbonData tables created in V100R002C50RC1 not reflecting the privileges provided in Hive Privileges for non-owner? + +Answer +------ + +The Hive ACL is implemented after the version V100R002C50RC1, hence the Hive ACL Privileges are not reflecting. + +To support HIVE ACL Privileges for CarbonData tables created in V100R002C50RC1, following two ALTER TABLE commands must be executed by owner of the table. + +**ALTER TABLE** *$dbname.$tablename SET LOCATION '$carbon.store/$dbname/$tablename';* + +**ALTER TABLE** *$dbname.$tablename SET SERDEPROPERTIES ('path'='$carbon.store/$dbname/$tablename');* + +Example: + +Assume database name is 'carbondb', table name is 'carbontable', and CarbonData store location is 'hdfs://hacluster/user/hive/warehouse/carbon.store', then the commands should be executed is as follows: + +**ALTER TABLE** *carbondb.carbontable SET LOCATION 'hdfs://hacluster/user/hive/warehouse/carbon.store/carbondb/carbontable';* + +**ALTER TABLE** *carbondb.carbontable SET SERDEPROPERTIES ('path'='hdfs://hacluster/user/hive/warehouse/carbon.store/carbondb/carbontable');* diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_data_load_performance_decreases_due_to_bad_records.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_data_load_performance_decreases_due_to_bad_records.rst new file mode 100644 index 0000000..3f65d95 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_data_load_performance_decreases_due_to_bad_records.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_1463.html + +.. _mrs_01_1463: + +Why Data Load Performance Decreases due to Bad Records? +======================================================= + +Question +-------- + +Why data load performance decreases due to bad records? + +Answer +------ + +If bad records are present in the data and **BAD_RECORDS_LOGGER_ENABLE** is **true** or **BAD_RECORDS_ACTION** is **redirect** then load performance will decrease due to extra I/O for writing failure reason in log file or redirecting the records to raw CSV. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_data_loading_fails_during_off_heap.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_data_loading_fails_during_off_heap.rst new file mode 100644 index 0000000..787b78c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_data_loading_fails_during_off_heap.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_1466.html + +.. _mrs_01_1466: + +Why Data loading Fails During off heap? +======================================= + +Question +-------- + +Why Data Loading fails during off heap? + +Answer +------ + +YARN Resource Manager will consider (Java heap memory + **spark.yarn.am.memoryOverhead**) as memory limit, so during the off heap, the memory can exceed this limit. So you need to increase the memory by increasing the value of the parameter **spark.yarn.am.memoryOverhead**. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_do_i_fail_to_create_a_hive_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_do_i_fail_to_create_a_hive_table.rst new file mode 100644 index 0000000..b97cd92 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_do_i_fail_to_create_a_hive_table.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_1467.html + +.. _mrs_01_1467: + +Why Do I Fail to Create a Hive Table? +===================================== + +Question +-------- + +Why do I fail to create a hive table? + +Answer +------ + +Creating a Hive table fails, when source table or sub query has more number of partitions. The implementation of the query requires a lot of tasks, then the number of files will be output a lot, resulting OOM in Driver. + +It can be solved by using **distribute by** on suitable cardinality(distinct values) column in the statement of Hive table creation. + +**distribute by** clause limits number of hive table partitions. It considers cardinality of given column or **spark.sql.shuffle.partitions** which ever is minimal. For example, if **spark.sql.shuffle.partitions** is 200, but cardinality of column is 100, out files is 200, but the other 100 files are empty. So using very low cardinality column like 1 will cause data skew and will effect later query distribution. + +So we suggest using the column with cardinality greater than **spark.sql.shuffle.partitions**. It can be greater than 2 to 3 times. + +Example: + +**create table hivetable1 as select \* from sourcetable1 distribute by col_age;** diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_does_carbondata_require_additional_executors_even_though_the_parallelism_is_greater_than_the_number_of_blocks_to_be_processed.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_does_carbondata_require_additional_executors_even_though_the_parallelism_is_greater_than_the_number_of_blocks_to_be_processed.rst new file mode 100644 index 0000000..21fe074 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_does_carbondata_require_additional_executors_even_though_the_parallelism_is_greater_than_the_number_of_blocks_to_be_processed.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_1465.html + +.. _mrs_01_1465: + +Why Does CarbonData Require Additional Executors Even Though the Parallelism Is Greater Than the Number of Blocks to Be Processed? +================================================================================================================================== + +Question +-------- + +Why does CarbonData require additional executors even though the parallelism is greater than the number of blocks to be processed? + +Answer +------ + +CarbonData block distribution optimizes data processing as follows: + +#. Optimize data processing parallelism. +#. Optimize parallel reading of block data. + +To optimize parallel processing and parallel read, CarbonData requests executors based on the locality of blocks so that it can obtain executors on all nodes. + +If you are using dynamic allocation, you need to configure the following properties: + +#. Set **spark.dynamicAllocation.executorIdleTimeout** to 15 minutes (or the average query time). +#. Set **spark.dynamicAllocation.maxExecutors** correctly. The default value **2048** is not recommended. Otherwise, CarbonData will request the maximum number of executors. +#. For a bigger cluster, set **carbon.dynamicAllocation.schedulerTimeout** to a value ranging from 10 to 15 seconds. The default value is 5 seconds. +#. Set **carbon.scheduler.minRegisteredResourcesRatio** to a value ranging from 0.1 to 1.0. The default value is **0.8**. Block distribution can be started as long as the value of **carbon.scheduler.minRegisteredResourcesRatio** is within the range. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_does_data_query_or_loading_fail_and_org.apache.carbondata.core.memory.memoryexception_not_enough_memory_is_displayed.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_does_data_query_or_loading_fail_and_org.apache.carbondata.core.memory.memoryexception_not_enough_memory_is_displayed.rst new file mode 100644 index 0000000..2a9db73 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_does_data_query_or_loading_fail_and_org.apache.carbondata.core.memory.memoryexception_not_enough_memory_is_displayed.rst @@ -0,0 +1,36 @@ +:original_name: mrs_01_1474.html + +.. _mrs_01_1474: + +Why Does Data Query or Loading Fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" Is Displayed? +============================================================================================================================ + +Question +-------- + +Why does data query or loading fail and "org.apache.carbondata.core.memory.MemoryException: Not enough memory" is displayed? + +Answer +------ + +This exception is thrown when the out-of-heap memory required for data query and loading in the executor is insufficient. + +In this case, increase the values of **carbon.unsafe.working.memory.in.mb** and **spark.yarn.executor.memoryOverhead**. + +For details, see :ref:`How Do I Configure Unsafe Memory in CarbonData? `. + +The memory is shared by data query and loading. Therefore, if the loading and query operations need to be performed at the same time, you are advised to set **carbon.unsafe.working.memory.in.mb** and **spark.yarn.executor.memoryOverhead** to a value greater than 2,048 MB. + +The following formula can be used for estimation: + +Memory required for data loading: + +carbon.number.of.cores.while.loading [default value is 6] x Number of tables to load in parallel x offheap.sort.chunk.size.inmb [default value is 64 MB] + carbon.blockletgroup.size.in.mb [default value is 64 MB] + Current compaction ratio [64 MB/3.5]) + += Around 900 MB per table + +Memory required for data query: + +(SPARK_EXECUTOR_INSTANCES. [default value is 2] x (carbon.blockletgroup.size.in.mb [default value: 64 MB] + carbon.blockletgroup.size.in.mb [default value = 64 MB x 3.5) x Number of cores per executor [default value: 1]) + += ~ 600 MB diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_does_insert_into_carbon_table_command_fail.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_does_insert_into_carbon_table_command_fail.rst new file mode 100644 index 0000000..10e8d18 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_does_insert_into_carbon_table_command_fail.rst @@ -0,0 +1,54 @@ +:original_name: mrs_01_1461.html + +.. _mrs_01_1461: + +Why Does INSERT INTO CARBON TABLE Command Fail? +=============================================== + +Question +-------- + +Why does the **INSERT INTO CARBON TABLE** command fail and the following error message is displayed? + +.. code-block:: + + Data load failed due to bad record + +Answer +------ + +The **INSERT INTO CARBON TABLE** command fails in the following scenarios: + +- If the data type of source and target table columns are not the same, the data from the source table will be treated as bad records and the **INSERT INTO** command fails. + +- If the result of aggregation function on a source column exceeds the maximum range of the target column, then the **INSERT INTO** command fails. + + Solution: + + You can use the cast function on corresponding columns when inserting records. + + For example: + + #. Run the **DESCRIBE** command to query the target and source table. + + **DESCRIBE** *newcarbontable*; + + Result: + + .. code-block:: + + col1 int + col2 bigint + + **DESCRIBE** *sourcetable*; + + Result: + + .. code-block:: + + col1 int + col2 int + + #. Add the cast function to convert bigint value to integer. + + **INSERT INTO** *newcarbontable select col1, cast(col2 as integer) from sourcetable;* diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_exception_occurs_in_carbondata_when_disk_space_quota_is_set_for_storage_directory_in_hdfs.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_exception_occurs_in_carbondata_when_disk_space_quota_is_set_for_storage_directory_in_hdfs.rst new file mode 100644 index 0000000..721f1d5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_exception_occurs_in_carbondata_when_disk_space_quota_is_set_for_storage_directory_in_hdfs.rst @@ -0,0 +1,36 @@ +:original_name: mrs_01_1473.html + +.. _mrs_01_1473: + +Why Exception Occurs in CarbonData When Disk Space Quota is Set for Storage Directory in HDFS? +============================================================================================== + +Question +-------- + +Why exception occurs in CarbonData when Disk Space Quota is set for the storage directory in HDFS? + +Answer +------ + +The data will be written to HDFS when you during create table, load table, update table, and so on. If the configured HDFS directory does not have sufficient disk space quota, then the operation will fail and throw following exception. + +.. code-block:: + + org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: + The DiskSpace quota of /user/tenant is exceeded: + quota = 314572800 B = 300 MB but diskspace consumed = 402653184 B = 384 MB at + org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:211) at + org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:239) at + org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:941) at + org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:745) + +If such exception occurs, configure a sufficient disk space quota for the tenant. + +For example: + +If the HDFS replication factor is 3 and HDFS default block size is 128 MB, then at least 384 MB (no. of block x block_size x replication_factor of the schema file = 1 x 128 x 3 = 384 MB) disk space quota is required to write a table schema file to HDFS. + +.. note:: + + In case of fact files, as the default block size is 1024 MB, the minimum space required is 3072 MB per fact file for data load. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_insert_into_load_data_task_distribution_is_incorrect_and_the_opened_tasks_are_less_than_the_available_executors_when_the_number_of_initial_executors_is_zero.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_insert_into_load_data_task_distribution_is_incorrect_and_the_opened_tasks_are_less_than_the_available_executors_when_the_number_of_initial_executors_is_zero.rst new file mode 100644 index 0000000..055744c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_insert_into_load_data_task_distribution_is_incorrect_and_the_opened_tasks_are_less_than_the_available_executors_when_the_number_of_initial_executors_is_zero.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_1464.html + +.. _mrs_01_1464: + +Why INSERT INTO/LOAD DATA Task Distribution Is Incorrect and the Opened Tasks Are Less Than the Available Executors when the Number of Initial Executors Is Zero? +================================================================================================================================================================= + +Question +-------- + +Why **INSERT INTO or LOAD DATA** task distribution is incorrect, and the openedtasks are less than the available executors when the number of initial executors is zero\ **?** + +Answer +------ + +In case of INSERT INTO or LOAD DATA, CarbonData distributes one task per node. If the executors are not allocated from the distinct nodes then CarbonData will launch fewer tasks. + +**Solution**: + +Configure higher value for the executor memory and core so that the yarn can launch only one executor per node. + +#. Configure the number of the Executor cores. + + - Configure the **spark.executor.cores** in **spark-defaults.conf** or the **SPARK_EXECUTOR_CORES** in **spark-env.sh** appropriately. + - Add **--executor-cores NUM** parameter to configure the cores during use the spark-submit command. + +#. Configure the Executor memory. + + - Configure the **spark.executor.memory** in **spark-defaults.conf** or the **SPARK_EXECUTOR_MEMORY** in **spark-env.sh** appropriately. + - Add **--executor-memory MEM** parameter to configure the memory during use the spark-submit command. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_is_incorrect_output_displayed_when_i_perform_query_with_filter_on_decimal_data_type_values.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_is_incorrect_output_displayed_when_i_perform_query_with_filter_on_decimal_data_type_values.rst new file mode 100644 index 0000000..f6751b8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_is_incorrect_output_displayed_when_i_perform_query_with_filter_on_decimal_data_type_values.rst @@ -0,0 +1,45 @@ +:original_name: mrs_01_1458.html + +.. _mrs_01_1458: + +Why Is Incorrect Output Displayed When I Perform Query with Filter on Decimal Data Type Values? +=============================================================================================== + +Question +-------- + +Why is incorrect output displayed when I perform query with filter on decimal data type values? + +For example: + +**select \* from carbon_table where num = 1234567890123456.22;** + +Output: + +.. code-block:: + + +------+---------------------+--+ + | name | num | + +------+---------------------+--+ + | IAA | 1234567890123456.22 | + | IAA | 1234567890123456.21 | + +------+---------------------+--+ + +Answer +------ + +To obtain accurate output, append BD to the number. + +For example: + +**select \* from carbon_table where num = 1234567890123456.22BD;** + +Output: + +.. code-block:: + + +------+---------------------+--+ + | name | num | + +------+---------------------+--+ + | IAA | 1234567890123456.22 | + +------+---------------------+--+ diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_is_the_data_logged_in_bad_records_different_from_the_original_input_data_with_escape_characters.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_is_the_data_logged_in_bad_records_different_from_the_original_input_data_with_escape_characters.rst new file mode 100644 index 0000000..0373e85 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_is_the_data_logged_in_bad_records_different_from_the_original_input_data_with_escape_characters.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_1462.html + +.. _mrs_01_1462: + +Why Is the Data Logged in Bad Records Different from the Original Input Data with Escape Characters? +==================================================================================================== + +Question +-------- + +Why is the data logged in bad records different from the original input data with escaped characters? + +Answer +------ + +An escape character is a backslash (\\) followed by one or more characters. If the input records contain escape characters such as \\t, \\b, \\n, \\r, \\f, \\', \\", \\\\ , java will process the escape character '\\' and the following characters together to obtain the escaped meaning. + +For example, if the CSV data type **2010\\\\10,test** is inserted to String,int type, the value is treated as bad records, because **tes**\ t cannot be converted to int. The value logged in the bad records is **2010\\10** because java processes **\\\\** as **\\**. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_missing_privileges_exception_is_reported_when_i_perform_drop_operation_on_databases.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_missing_privileges_exception_is_reported_when_i_perform_drop_operation_on_databases.rst new file mode 100644 index 0000000..a366220 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_missing_privileges_exception_is_reported_when_i_perform_drop_operation_on_databases.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_1470.html + +.. _mrs_01_1470: + +Why Missing Privileges Exception is Reported When I Perform Drop Operation on Databases? +======================================================================================== + +Question +-------- + +Why drop database cascade is throwing the following exception? + +.. code-block:: + + Error: org.apache.spark.sql.AnalysisException: Missing Privileges;(State=,code=0) + +Answer +------ + +This error is thrown when the owner of the database performs **drop database cascade** which contains tables created by other users. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_the_update_command_cannot_be_executed_in_spark_shell.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_the_update_command_cannot_be_executed_in_spark_shell.rst new file mode 100644 index 0000000..3b03b17 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_faq/why_the_update_command_cannot_be_executed_in_spark_shell.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_1471.html + +.. _mrs_01_1471: + +Why the UPDATE Command Cannot Be Executed in Spark Shell? +========================================================= + +Question +-------- + +Why the UPDATE command cannot be executed in Spark Shell? + +Answer +------ + +The syntax and examples provided in this document are about Beeline commands instead of Spark Shell commands. + +To run the UPDATE command in Spark Shell, use the following syntax: + +- Syntax 1 + + **.sql("UPDATE SET (column_name1, column_name2, ... column_name n) = (column1_expression , column2_expression , column3_expression ... column n_expression) [ WHERE { } ];").show** + +- Syntax 2 + + **.sql("UPDATE SET (column_name1, column_name2,) = (select sourceColumn1, sourceColumn2 from sourceTable [ WHERE { } ] ) [ WHERE { } ];").show** + +Example: + +If the context of CarbonData is **carbon**, run the following command: + +**carbon.sql("update carbonTable1 d set (d.column3,d.column5) = (select s.c33 ,s.c55 from sourceTable1 s where d.column1 = s.c11) where d.column1 = 'country' exists( select \* from table3 o where o.c2 > 1);").show** diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_data_migration.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_data_migration.rst new file mode 100644 index 0000000..844b5c2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_data_migration.rst @@ -0,0 +1,84 @@ +:original_name: mrs_01_1416.html + +.. _mrs_01_1416: + +CarbonData Data Migration +========================= + +Scenario +-------- + +If you want to rapidly migrate CarbonData data from a cluster to another one, you can use the CarbonData backup and restoration commands. This method does not require data import in the target cluster, reducing required migration time. + +Prerequisites +------------- + +The Spark2x client has been installed in a directory, for example, **/opt/client**, in two clusters. The source cluster is cluster A, and the target cluster is cluster B. + +Procedure +--------- + +#. Log in to the node where the client is installed in cluster A as a client installation user. + +#. Run the following commands to configure environment variables: + + **source /opt/ficlient/bigdata_env** + + **source /opt/ficlient/Spark2x/component_env** + +#. If the cluster is in security mode, run the following command to authenticate the user. In normal mode, skip user authentication. + + **kinit** *carbondatauser* + + *carbondatauser* indicates the user of the original data. That is, the user has the read and write permissions for the tables. + +#. Run the following command to connect to the database and check the location for storing table data on HDFS: + + **spark-beeline** + + **desc formatted** *Name of the table containing the original data*\ **;** + + **Location** in the displayed information indicates the directory where the data file resides. + +#. Log in to the node where the client is installed in cluster B as a client installation user and configure the environment variables: + + **source /opt/ficlient/bigdata_env** + + **source /opt/ficlient/Spark2x/component_env** + +#. If the cluster is in security mode, run the following command to authenticate the user. In normal mode, skip user authentication. + + **kinit** *carbondatauser2* + + *carbondatauser2* indicates the user that uploads data. + +#. Run the **spark-beeline** command to connect to the database. + +#. Does the database that maps to the original data exist? + + - If yes, go to :ref:`9 `. + - If no, create a database with the same name and go to :ref:`9 `. + +#. .. _mrs_01_1416__en-us_topic_0000001219230605_lb95e9d29c6fc469a8375f190f4136467: + + Copy the original data from the HDFS directory in cluster A to the HDFS directory in cluster B. + + When uploading data in cluster B, ensure that the upload directory has the directories with the same names as the database and table in the original directory and the upload user has the permission to write data to the upload directory. After the data is uploaded, the user has the permission to read and write the data. + + For example, if the original data is stored in **/user/carboncadauser/warehouse/db1/tb1**, the data can be stored in **/user/carbondatauser2/warehouse/db1/tb1** in the new cluster. + +#. .. _mrs_01_1416__en-us_topic_0000001219230605_laf7ce95fc3cc4ab2a96640541690ed30: + + In the client environment of cluster B, run the following command to generate the metadata associated with the table corresponding to the original data in Hive: + + **REFRESH TABLE** *$dbName.$tbName*\ **;** + + *$dbName* indicates the database name, and *$tbName* indicates the table name. + +#. If the original table contains an index table, perform :ref:`9 ` and :ref:`10 ` to migrate the index table directory from cluster A to cluster B. + +#. Run the following command to register an index table for the CarbonData table (skip this step if no index table is created for the original table): + + **REGISTER INDEX TABLE** *$tableName* ON *$maintable*; + + *$tableName* indicates the index table name, and *$maintable* indicates the table name. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_quick_start.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_quick_start.rst new file mode 100644 index 0000000..17802b7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_quick_start.rst @@ -0,0 +1,135 @@ +:original_name: mrs_01_1406.html + +.. _mrs_01_1406: + +CarbonData Quick Start +====================== + +This section describes how to create CarbonData tables, load data, and query data. This quick start provides operations based on the Spark Beeline client. If you want to use Spark shell, wrap the queries with **spark.sql()**. + +.. table:: **Table 1** CarbonData Quick Start + + +----------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+ + | Operation | Description | + +============================================================================================================================+=======================================================================+ + | :ref:`Connecting to CarbonData ` | Connect to CarbonData before performing any operations on CarbonData. | + +----------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+ + | :ref:`Creating a CarbonData Table ` | Create a CarbonData table to load data and perform query operations. | + +----------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+ + | :ref:`Loading Data to a CarbonData Table ` | Load data from CSV to the created table. | + +----------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+ + | :ref:`Querying Data from a CarbonData Table ` | Perform query operations such as filters and groupby. | + +----------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+ + +.. _mrs_01_1406__en-us_topic_0000001219029123_s2af9b9318a0f44c48f3b0fa8217a12fe: + +Connecting to CarbonData +------------------------ + +- Use Spark SQL or Spark shell to connect to Spark and run Spark SQL commands. +- Start the JDBCServer and use a JDBC client (for example, Spark Beeline) to connect to the JDBCServer. + +.. note:: + + A user must belong to the data loading group for performing data loading operations. The default name of the data loading group is **ficommon**. + +.. _mrs_01_1406__en-us_topic_0000001219029123_sffd808b54dab44fc8613a01cc8e39baf: + +Creating a CarbonData Table +--------------------------- + +After connecting Spark Beeline with the JDBCServer, create a CarbonData table to load data and perform query operations. Run the following commands to create a simple table: + +.. code-block:: + + create table x1 (imei string, deviceInformationId int, mac string, productdate timestamp, updatetime timestamp, gamePointId double, contractNumber double) STORED AS carbondata TBLPROPERTIES ('SORT_COLUMNS'='imei,mac'); + +The command output is as follows: + +.. code-block:: + + +---------+--+ + | result | + +---------+--+ + +---------+--+ + No rows selected (1.551 seconds) + +.. _mrs_01_1406__en-us_topic_0000001219029123_s79b594787a2a46819cf07478d4a0d81c: + +Loading Data to a CarbonData Table +---------------------------------- + +After you have created a CarbonData table, you can load the data from CSV to the created table. + +**Loading data from a CSV file to a CarbonData table** + +Run the following command with required parameters to load data from CSV. The column names of the CarbonData table must match the column names of the CSV file. + +**LOAD DATA inpath 'hdfs://hacluster/data/**\ *x1_without_header.csv*\ **' into table** *x1* **options('DELIMITER'=',', 'QUOTECHAR'='"','FILEHEADER'='imei, deviceinformationid,mac, productdate,updatetime, gamepointid,contractnumber');** + +In the preceding command, **x1_without_header.csv** and **x1** are used as examples. + +The CSV example file is as follows: + +.. code-block:: + + 13418592122,1001, MAC address, 2017-10-23 15:32:30,2017-10-24 15:32:30,62.50,74.56 + 13418592123 1002, MAC address, 2017-10-23 16:32:30,2017-10-24 16:32:30,17.80,76.28 + 13418592124,1003, MAC address, 2017-10-23 17:32:30,2017-10-24 17:32:30,20.40,92.94 + 13418592125 1004, MAC address, 2017-10-23 18:32:30,2017-10-24 18:32:30,73.84,8.58 + 13418592126,1005, MAC address, 2017-10-23 19:32:30,2017-10-24 19:32:30,80.50,88.02 + 13418592127 1006, MAC address, 2017-10-23 20:32:30,2017-10-24 20:32:30,65.77,71.24 + 13418592128,1007, MAC address, 2017-10-23 21:32:30,2017-10-24 21:32:30,75.21,76.04 + 13418592129,1008, MAC address, 2017-10-23 22:32:30,2017-10-24 22:32:30,63.30,94.40 + 13418592130, 1009, MAC address, 2017-10-23 23:32:30,2017-10-24 23:32:30,95.51,50.17 + 13418592131,1010, MAC address, 2017-10-24 00:32:30,2017-10-25 00:32:30,39.62,99.13 + +The command output is as follows: + +.. code-block:: + + +---------+--+ + | Result | + +---------+--+ + +---------+--+ + No rows selected (3.039 seconds) + +.. _mrs_01_1406__en-us_topic_0000001219029123_s68e9413d1b234b2d91557a1739fc7828: + +Querying Data from a CarbonData Table +------------------------------------- + +After a CarbonData table is created and the data is loaded, you can perform query operations as required. Some query operations are provided as examples. + +- **Obtaining the number of records** + + Run the following command to obtain the number of records in the CarbonData table: + + **select count(*) from x1;** + +- **Querying with the groupby condition** + + Run the following command to obtain the **deviceinformationid** records without repetition in the CarbonData table: + + **select deviceinformationid,count (distinct deviceinformationid) from x1 group by deviceinformationid;** + +- **Querying with Filter** + + Run the following command to obtain specific **deviceinformationid** records: + + **select \* from x1 where deviceinformationid='1010';** + +.. note:: + + If the query result has other non-English characters, the columns in the query result may not be aligned. This is because characters of different languages occupy different widths. + +Using CarbonData on Spark-shell +------------------------------- + +If you need to use CarbonData on a Spark-shell, you need to create a CarbonData table, load data to the CarbonData table, and query data in CarbonData as follows: + +.. code-block:: + + spark.sql("CREATE TABLE x2(imei string, deviceInformationId int, mac string, productdate timestamp, updatetime timestamp, gamePointId double, contractNumber double) STORED AS carbondata") + spark.sql("LOAD DATA inpath 'hdfs://hacluster/data/x1_without_header.csv' into table x2 options('DELIMITER'=',', 'QUOTECHAR'='\"','FILEHEADER'='imei, deviceinformationid,mac, productdate,updatetime, gamepointid,contractnumber')") + spark.sql("SELECT * FROM x2").show() diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/combining_segments.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/combining_segments.rst new file mode 100644 index 0000000..825410e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/combining_segments.rst @@ -0,0 +1,89 @@ +:original_name: mrs_01_1415.html + +.. _mrs_01_1415: + +Combining Segments +================== + +Scenario +-------- + +Frequent data access results in a large number of fragmented CarbonData files in the storage directory. In each data loading, data is sorted and indexing is performed. This means that an index is generated for each load. With the increase of data loading times, the number of indexes also increases. As each index works only on one loading, the performance of index is reduced. CarbonData provides loading and compression functions. In a compression process, data in each segment is combined and sorted, and multiple segments are combined into one large segment. + +Prerequisites +------------- + +Multiple data loadings have been performed. + +Operation Description +--------------------- + +There are three types of compaction: Minor, Major, and Custom. + +- Minor compaction: + + In minor compaction, you can specify the number of loads to be merged. If **carbon.enable.auto.load.merge** is set, minor compaction is triggered for every data load. If any segments are available to be merged, then compaction will run parallel with data load. + + There are two levels in minor compaction: + + - Level 1: Merging of the segments which are not yet compacted + - Level 2: Merging of the compacted segments again to form a larger segment + +- Major compaction: + + Multiple segments can be merged into one large segment. You can specify the compaction size so that all segments below the size will be merged. Major compaction is usually done during the off-peak time. + +- .. _mrs_01_1415__en-us_topic_0000001219149063_li68503712544: + + Custom compaction: + + In Custom compaction, you can specify the IDs of multiple segments to merge them into a large segment. The IDs of all the specified segments must exist and be valid. Otherwise, the compaction fails. Custom compaction is usually done during the off-peak time. + +For details, see :ref:`ALTER TABLE COMPACTION `. + +.. _mrs_01_1415__en-us_topic_0000001219149063_t9ba7557f991f4d6caad3710c4a51b9f2: + +.. table:: **Table 1** Compaction parameters + + +-----------------------------------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Application Type | Description | + +=========================================+=================+==================+====================================================================================================================================================================================================================================================================================================================================================+ + | carbon.enable.auto.load.merge | false | Minor | Whether to enable compaction along with data loading. | + | | | | | + | | | | **true**: Compaction is automatically triggered when data is loaded. | + | | | | | + | | | | **false**: Compaction is not triggered when data is loaded. | + +-----------------------------------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.compaction.level.threshold | 4,3 | Minor | This configuration is for minor compaction which decides how many segments to be merged. | + | | | | | + | | | | For example, if this parameter is set to **2,3**, minor compaction is triggered every two segments and segments form a single level 1 compacted segment. When the number of compacted level 1 segments reach 3, compaction is triggered again to merge them to form a single level 2 segment. | + | | | | | + | | | | The compaction policy depends on the actual data size and available resources. | + | | | | | + | | | | The value ranges from 0 to 100. | + +-----------------------------------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.major.compaction.size | 1024 MB | Major | The major compaction size can be configured using this parameter. Sum of the segments which is below this threshold will be merged. | + | | | | | + | | | | For example, if this parameter is set to 1024 MB, and there are five segments whose sizes are 300 MB, 400 MB, 500 MB, 200 MB, and 100 MB used for major compaction, only segments whose total size is less than this threshold are compacted. In this example, only the segments whose sizes are 300 MB, 400 MB, 200 MB, and 100 MB are compacted. | + +-----------------------------------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.numberof.preserve.segments | 0 | Minor/Major | If you want to preserve some number of segments from being compacted, then you can set this configuration. | + | | | | | + | | | | For example, if **carbon.numberof.preserve.segments** is set to **2**, the latest two segments will always be excluded from the compaction. | + | | | | | + | | | | By default, no segments are reserved. | + +-----------------------------------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.allowed.compaction.days | 0 | Minor/Major | This configuration is used to control on the number of recent segments that needs to be compacted. | + | | | | | + | | | | For example, if this parameter is set to **2**, the segments which are loaded in the time frame of past 2 days only will get merged. Segments which are loaded earlier than 2 days will not be merged. | + | | | | | + | | | | This configuration is disabled by default. | + +-----------------------------------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.number.of.cores.while.compacting | 2 | Minor/Major | Number of cores to be used while compacting data. The greater the number of cores, the better the compaction performance. If the CPU resources are sufficient, you can increase the value of this parameter. | + +-----------------------------------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.merge.index.in.segment | true | SEGMENT_INDEX | If this parameter is set to **true**, all the Carbon index (.carbonindex) files in a segment will be merged into a single Index (.carbonindexmerge) file. This enhances the first query performance. | + +-----------------------------------------+-----------------+------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Reference +--------- + +You are advised not to perform minor compaction on historical data. For details, see :ref:`How to Avoid Minor Compaction for Historical Data? `. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/deleting_segments.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/deleting_segments.rst new file mode 100644 index 0000000..1b7fc8f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/deleting_segments.rst @@ -0,0 +1,101 @@ +:original_name: mrs_01_1414.html + +.. _mrs_01_1414: + +Deleting Segments +================= + +Scenario +-------- + +If you want to modify and reload the data because you have loaded wrong data into a table, or there are too many bad records, you can delete specific segments by segment ID or data loading time. + +.. note:: + + The segment deletion operation only deletes segments that are not compacted. You can run the **CLEAN FILES** command to clear compacted segments. + +Deleting a Segment by Segment ID +-------------------------------- + +Each segment has a unique ID. This segment ID can be used to delete the segment. + +#. Obtain the segment ID. + + Command: + + **SHOW SEGMENTS FOR Table** *dbname.tablename LIMIT number_of_loads;* + + Example: + + **SHOW SEGMENTS FOR TABLE** *carbonTable;* + + Run the preceding command to show all the segments of the table named **carbonTable**. + + **SHOW SEGMENTS FOR TABLE** *carbonTable LIMIT 2;* + + Run the preceding command to show segments specified by *number_of_loads*. + + The command output is as follows: + + .. code-block:: + + +-----+----------+--------------------------+------------------+------------+------------+-------------+--------------+--+ + | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | Index Size | File Format | + +-----+----------+--------------------------+------------------+------------+------------+-------------+--------------+--+ + | 3 | Success | 2020-09-28 22:53:26.336 | 3.726S | {} | 6.47KB | 3.30KB | columnar_v3 | + | 2 | Success | 2020-09-28 22:53:01.702 | 6.688S | {} | 6.47KB | 3.30KB | columnar_v3 | + +-----+----------+--------------------------+------------------+------------+------------+-------------+--------------+--+ + + .. note:: + + The output of the **SHOW SEGMENTS** command includes ID, Status, Load Start Time, Load Time Taken, Partition, Data Size, Index Size, and File Format. The latest loading information is displayed in the first line of the command output. + +#. Run the following command to delete the segment after you have found the Segment ID: + + Command: + + **DELETE FROM TABLE tableName WHERE SEGMENT.ID IN (load_sequence_id1, load_sequence_id2, ....)**; + + Example: + + **DELETE FROM TABLE carbonTable WHERE SEGMENT.ID IN (1,2,3)**; + + For details, see :ref:`DELETE SEGMENT by ID `. + +Deleting a Segment by Data Loading Time +--------------------------------------- + +You can delete a segment based on the loading time. + +Command: + +**DELETE FROM TABLE db_name.table_name WHERE SEGMENT.STARTTIME BEFORE date_value**; + +Example: + +**DELETE FROM TABLE carbonTable WHERE SEGMENT.STARTTIME BEFORE '2017-07-01 12:07:20'**; + +The preceding command can be used to delete all segments before 2017-07-01 12:07:20. + +For details, see :ref:`DELETE SEGMENT by DATE `. + +Result +------ + +Data of corresponding segments is deleted and is unavailable for query. You can run the **SHOW SEGMENTS** command to display the segment status and check whether the segment has been deleted. + +.. note:: + + - Segments are not physically deleted after the execution of the **DELETE SEGMENT** command. Therefore, if you run the **SHOW SEGMENTS** command to check the status of a deleted segment, it will be marked as **Marked for Delete**. If you run the **SELECT \* FROM tablename** command, the deleted segment will be excluded. + + - The deleted segment will be deleted physically only when the next data loading reaches the maximum query execution duration, which is configured by the **max.query.execution.time** parameter. The default value of the parameter is 60 minutes. + + - If you want to forcibly delete a physical segment file, run the **CLEAN FILES** command. + + Example: + + **CLEAN FILES FOR TABLE table1;** + + This command will physically delete the segment file in the **Marked for delete** state. + + If this command is executed before the time specified by **max.query.execution.time** arrives, the query may fail. **max.query.execution.time** indicates the maximum time allowed for a query, which is set in the **carbon.properties** file. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/index.rst new file mode 100644 index 0000000..e5d9749 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_1412.html + +.. _mrs_01_1412: + +CarbonData Table Data Management +================================ + +- :ref:`Loading Data ` +- :ref:`Deleting Segments ` +- :ref:`Combining Segments ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + loading_data + deleting_segments + combining_segments diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/loading_data.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/loading_data.rst new file mode 100644 index 0000000..c5c5070 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_data_management/loading_data.rst @@ -0,0 +1,11 @@ +:original_name: mrs_01_1413.html + +.. _mrs_01_1413: + +Loading Data +============ + +Scenario +-------- + +After a CarbonData table is created, you can run the **LOAD DATA** command to load data to the table for query. Once data loading is triggered, data is encoded in CarbonData format and files in multi-dimensional and column-based format are compressed and copied to the HDFS path of CarbonData files for quick analysis and queries. The HDFS path can be configured in the **carbon.properties** file. For details, see :ref:`Configuration Reference `. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/about_carbondata_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/about_carbondata_table.rst new file mode 100644 index 0000000..82b05e9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/about_carbondata_table.rst @@ -0,0 +1,85 @@ +:original_name: mrs_01_1408.html + +.. _mrs_01_1408: + +About CarbonData Table +====================== + +Overview +-------- + +In CarbonData, data is stored in entities called tables. CarbonData tables are similar to RDBMS tables. RDBMS data is stored in a table consisting of rows and columns. CarbonData tables store structured data, and have fixed columns and data types. + +Supported Data Types +-------------------- + +CarbonData tables support the following data types: + +- Int +- String +- BigInt +- Smallint +- Char +- Varchar +- Boolean +- Decimal +- Double +- TimeStamp +- Date +- Array +- Struct +- Map + +The following table describes supported data types and their respective values range. + +.. table:: **Table 1** CarbonData data types + + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Data Type | Value Range | + +======================================================+======================================================================================================================================================================+ + | Int | 4-byte signed integer ranging from -2,147,483,648 to 2,147,483,647. | + | | | + | | .. note:: | + | | | + | | If a non-dictionary column is of the **int** data type, it is internally stored as the **BigInt** type. | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | String | 100,000 characters | + | | | + | | .. note:: | + | | | + | | If the **CHAR** or **VARCHAR** data type is used in **CREATE TABLE**, the two data types are automatically converted to the String data type. | + | | | + | | If a column contains more than 32,000 characters, add the column to the **LONG_STRING_COLUMNS** attribute of the **tblproperties** table during table creation. | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | BigInt | 64-bit value ranging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | SmallInt | -32,768 to 32,767 | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Char | A to Z and a to z | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Varchar | A to Z, a to z, and 0 to 9 | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Boolean | **true** or **false** | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Decimal | The default value is (10,0) and maximum value is (38,38). | + | | | + | | .. note:: | + | | | + | | When query with filters, append **BD** to the number to achieve accurate results. For example, **select \* from carbon_table where num = 1234567890123456.22BD**. | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Double | 64-bit value ranging from 4.9E-324 to 1.7976931348623157E308 | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | TimeStamp | The default format is **yyyy-MM-dd HH:mm:ss**. | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Date | The **DATE** data type is used to store calendar dates. The default format is **yyyy-MM-DD**. | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Array | N/A | + | | | + | | .. note:: | + | | | + | | Currently, only two layers of complex types can be nested. | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Struct | | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Map | | + +------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/creating_a_carbondata_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/creating_a_carbondata_table.rst new file mode 100644 index 0000000..e7529e0 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/creating_a_carbondata_table.rst @@ -0,0 +1,92 @@ +:original_name: mrs_01_1409.html + +.. _mrs_01_1409: + +Creating a CarbonData Table +=========================== + +Scenario +-------- + +A CarbonData table must be created to load and query data. You can run the **Create Table** command to create a table. This command is used to create a table using custom columns. + +Creating a Table with Self-Defined Columns +------------------------------------------ + +Users can create a table by specifying its columns and data types. + +Sample command: + +**CREATE TABLE** *IF NOT EXISTS productdb.productSalesTable (* + +*productNumber Int,* + +*productName String,* + +*storeCity String,* + +*storeProvince String,* + +*productCategory String,* + +*productBatch String,* + +*saleQuantity Int,* + +*revenue Int)* + +STORED AS *carbondata* + +*TBLPROPERTIES (* + +*'table_blocksize'='128');* + +The following table describes parameters of preceding commands. + +.. table:: **Table 1** Parameter description + + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+===================================================================================================================================================================================================================================+ + | productSalesTable | Table name. The table is used to load data for analysis. | + | | | + | | The table name consists of letters, digits, and underscores (_). | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | productdb | Database name. The database maintains logical connections with tables stored in it to identify and manage the tables. | + | | | + | | The database name consists of letters, digits, and underscores (_). | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | productName | Columns in the table. The columns are service entities for data analysis. | + | | | + | storeCity | The column name (field name) consists of letters, digits, and underscores (_). | + | | | + | storeProvince | | + | | | + | procuctCategory | | + | | | + | productBatch | | + | | | + | saleQuantity | | + | | | + | revenue | | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | table_blocksize | Indicates the block size of data files used by the CarbonData table, in MB. The value ranges from **1** to **2048**. The default value is **1024**. | + | | | + | | If **table_blocksize** is too small, a large number of small files will be generated when data is loaded. This may affect the performance of HDFS. | + | | | + | | If **table_blocksize** is too large, during data query, the amount of block data that matches the index is large, and some blocks contain a large number of blocklets, affecting read concurrency and lowering query performance. | + | | | + | | You are advised to set the block size based on the data volume. For example, set the block size to 256 MB for GB-level data, 512 MB for TB-level data, and 1024 MB for PB-level data. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. note:: + + - Measurement of all integer data is processed and displayed using the **BigInt** data type. + - CarbonData parses data strictly. Any data that cannot be parsed is saved as **null** in the table. For example, if the user loads the **double** value (3.14) to the BigInt column, the data is saved as **null**. + - The Short and Long data types used in the **Create Table** command are shown as smallint and bigint in the **DESCRIBE** command, respectively. + - You can run the **DESCRIBE** command to view the table data size and table index size. + +Operation Result +---------------- + +Run the command to create a table. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/deleting_a_carbondata_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/deleting_a_carbondata_table.rst new file mode 100644 index 0000000..49dbd7c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/deleting_a_carbondata_table.rst @@ -0,0 +1,33 @@ +:original_name: mrs_01_1410.html + +.. _mrs_01_1410: + +Deleting a CarbonData Table +=========================== + +Scenario +-------- + +You can run the **DROP TABLE** command to delete a table. After a CarbonData table is deleted, its metadata and loaded data are deleted together. + +Procedure +--------- + +Run the following command to delete a CarbonData table: + +Run the following command: + +**DROP TABLE** *[IF EXISTS] [db_name.]table_name;* + +Once this command is executed, the table is deleted from the system. In the command, **db_name** is an optional parameter. If **db_name** is not specified, the table named **table_name** in the current database is deleted. + +Example: + +**DROP TABLE** *productdb.productSalesTable;* + +Run the preceding command to delete the **productSalesTable** table from the **productdb** database. + +Operation Result +---------------- + +Deletes the table specified in the command from the system. After the table is deleted, you can run the **SHOW TABLES** command to check whether the table is successfully deleted. For details, see :ref:`SHOW TABLES `. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/index.rst new file mode 100644 index 0000000..731323d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/index.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_1407.html + +.. _mrs_01_1407: + +CarbonData Table Management +=========================== + +- :ref:`About CarbonData Table ` +- :ref:`Creating a CarbonData Table ` +- :ref:`Deleting a CarbonData Table ` +- :ref:`Modify the CarbonData Table ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + about_carbondata_table + creating_a_carbondata_table + deleting_a_carbondata_table + modify_the_carbondata_table diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/modify_the_carbondata_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/modify_the_carbondata_table.rst new file mode 100644 index 0000000..7dd8ee5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/carbondata_table_management/modify_the_carbondata_table.rst @@ -0,0 +1,41 @@ +:original_name: mrs_01_1411.html + +.. _mrs_01_1411: + +Modify the CarbonData Table +=========================== + +**SET** and **UNSET** +--------------------- + +When the **SET** command is executed, the new properties overwrite the existing ones. + +- SORT SCOPE + + The following is an example of the **SET SORT SCOPE** command: + + **ALTER TABLE** *tablename* **SET TBLPROPERTIES('SORT_SCOPE'**\ =\ *'no_sort'*) + + After running the **UNSET SORT SCOPE** command, the default value **NO_SORT** is adopted. + + The following is an example of the **UNSET SORT SCOPE** command: + + **ALTER TABLE** *tablename* **UNSET TBLPROPERTIES('SORT_SCOPE'**) + +- SORT COLUMNS + + The following is an example of the **SET SORT COLUMNS** command: + + **ALTER TABLE** *tablename* **SET TBLPROPERTIES('SORT_COLUMNS'**\ =\ *'column1'*) + + After this command is executed, the new value of **SORT_COLUMNS** is used. Users can adjust the **SORT_COLUMNS** based on the query results, but the original data is not affected. The operation does not affect the query performance of the original data segments which are not sorted by new **SORT_COLUMNS**. + + The **UNSET** command is not supported, but the **SORT_COLUMNS** can be set to empty string instead of using the **UNSET** command. + + **ALTER TABLE** *tablename* **SET TBLPROPERTIES('SORT_COLUMNS'**\ =\ *''*) + + .. note:: + + - The later version will enhance custom compaction to resort the old segments. + - The value of **SORT_COLUMNS** cannot be modified in the streaming table. + - If the **inverted index** column is removed from **SORT_COLUMNS**, **inverted index** will not be created in this column. However, the old configuration of **INVERTED_INDEX** will be kept. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/index.rst new file mode 100644 index 0000000..62818be --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/index.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_1405.html + +.. _mrs_01_1405: + +CarbonData Operation Guide +========================== + +- :ref:`CarbonData Quick Start ` +- :ref:`CarbonData Table Management ` +- :ref:`CarbonData Table Data Management ` +- :ref:`CarbonData Data Migration ` +- :ref:`Migrating Data on CarbonData from Spark1.5 to Spark2x ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + carbondata_quick_start + carbondata_table_management/index + carbondata_table_data_management/index + carbondata_data_migration + migrating_data_on_carbondata_from_spark1.5_to_spark2x diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/migrating_data_on_carbondata_from_spark1.5_to_spark2x.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/migrating_data_on_carbondata_from_spark1.5_to_spark2x.rst new file mode 100644 index 0000000..8ecb054 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_operation_guide/migrating_data_on_carbondata_from_spark1.5_to_spark2x.rst @@ -0,0 +1,70 @@ +:original_name: mrs_01_2301.html + +.. _mrs_01_2301: + +Migrating Data on CarbonData from Spark1.5 to Spark2x +===================================================== + +Migration Solution Overview +--------------------------- + +This migration guides you to migrate the CarbonData table data of Spark 1.5 to that of Spark2x. + +.. note:: + + Before performing this operation, you need to stop the data import service of the CarbonData table in Spark 1.5 and migrate data to the CarbonData table of Spark2x at a time. After the migration is complete, use Spark2x to perform service operations. + +Migration roadmap: + +#. Use Spark 1.5 to migrate historical data to the intermediate table. +#. Use Spark2x to migrate data from the intermediate table to the target table and change the target table name to the original table name. +#. After the migration is complete, use Spark2x to operate data in the CarbonData table. + +Migration Solution and Commands +------------------------------- + +**Migrating Historical Data** + +#. Stop the CarbonData data import service, use spark-beeline of Spark 1.5 to view the ID and time of the latest segment in the CarbonData table, and record the segment ID. + + **show segments for table dbname.tablename;** + +#. .. _mrs_01_2301__en-us_topic_0000001173470780_li1092003116117: + + Run spark-beeline of Spark 1.5 as the user who has created the original CarbonData table to create an intermediate table in ORC or Parquet format. Then import the data in the original CarbonData table to the intermediate table. After the import is complete, the services of the CarbonData table can be restored. + + Create an ORC table. + + **CREATE TABLE dbname.mid_tablename_orc STORED AS ORC as select \* from dbname.tablename;** + + Create a Parquet table. + + **CREATE TABLE dbname.mid_tablename_parq STORED AS PARQUET as select \* from dbname.tablename;** + + In the preceding command, **dbname** indicates the database name and **tablename** indicates the name of the original CarbonData table. + +#. .. _mrs_01_2301__en-us_topic_0000001173470780_li192112311210: + + Run spark-beeline of Spark2x as the user who has created the original CarbonData table. Run the table creation statement of the old table to create a CarbonData table. + + .. note:: + + In the statement for creating a new table, the field sequence and type must be the same as those of the old table. In this way, the index column structure of the old table can be retained, which helps avoid errors caused by the use of **select \*** statement during data insertion. + + Run the spark-beeline command of Spark 1.5 to view the table creation statement of the old table: **SHOW CREATE TABLE dbname.tablename;** + + Create a CarbonData table named **dbname.new_tablename**. + +#. Run spark-beeline of Spark2x as the user who has created the original CarbonData table to load the intermediate table data in ORC (or PARQUET) format created in :ref:`2 ` to the new table created in :ref:`3 `. This step may take a long time (about 2 hours for 200 GB data). The following uses the ORC intermediate table as an example to describe the command for loading data: + + **insert into dbname.new_tablename select \*** + + **from dbname. mid_tablename_orc;** + +#. Run spark-beeline of Spark2x as the user who has created the original CarbonData table to query and verify the data in the new table. If the data is correct, change the name of the original CarbonData table and then change the name of the new CarbonData table to the name of the original one. + + **ALTER TABLE dbname.tablename RENAME TO dbname.old_tablename;** + + **ALTER TABLE dbname.new_tablename RENAME TO dbname.tablename;** + +#. Complete the migration. In this case, you can use Spark2x to query the new table and rebuild the secondary index. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/configurations_for_performance_tuning.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/configurations_for_performance_tuning.rst new file mode 100644 index 0000000..31cb509 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/configurations_for_performance_tuning.rst @@ -0,0 +1,136 @@ +:original_name: mrs_01_1421.html + +.. _mrs_01_1421: + +Configurations for Performance Tuning +===================================== + +Scenario +-------- + +This section describes the configurations that can improve CarbonData performance. + +Procedure +--------- + +:ref:`Table 1 ` and :ref:`Table 2 ` describe the configurations about query of CarbonData. + +.. _mrs_01_1421__en-us_topic_0000001173949286_t21812cb0660b4dc8b7b03d48f5b8e23e: + +.. table:: **Table 1** Number of tasks started for the shuffle process + + +----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | spark.sql.shuffle.partitions | + +----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration File | spark-defaults.conf | + +----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Function | Data query | + +----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario Description | Number of tasks started for the shuffle process in Spark | + +----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Tuning | You are advised to set this parameter to one to two times as much as the executor cores. In an aggregation scenario, reducing the number from 200 to 32 can reduce the query time by two folds. | + +----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _mrs_01_1421__en-us_topic_0000001173949286_t8a9249ffd966446e9bfade15a686addd: + +.. table:: **Table 2** Number of executors and vCPUs, and memory size used for CarbonData data query + + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | spark.executor.cores | + | | | + | | spark.executor.instances | + | | | + | | spark.executor.memory | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration File | spark-defaults.conf | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Function | Data query | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario Description | Number of executors and vCPUs, and memory size used for CarbonData data query | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Tuning | In the bank scenario, configuring 4 vCPUs and 15 GB memory for each executor will achieve good performance. The two values do not mean the more the better. Configure the two values properly in case of limited resources. If each node has 32 vCPUs and 64 GB memory in the bank scenario, the memory is not sufficient. If each executor has 4 vCPUs and 12 GB memory, Garbage Collection may occur during query, time spent on query from increases from 3s to more than 15s. In this case, you need to increase the memory or reduce the number of vCPUs. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +:ref:`Table 3 `, :ref:`Table 4 `, and :ref:`Table 5 ` describe the configurations for CarbonData data loading. + +.. _mrs_01_1421__en-us_topic_0000001173949286_t237c47d9db1c411eaf39aa4d920be2f3: + +.. table:: **Table 3** Number of vCPUs used for data loading + + +----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | carbon.number.of.cores.while.loading | + +----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration File | carbon.properties | + +----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Function | Data loading | + +----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario Description | Number of vCPUs used for data processing during data loading in CarbonData | + +----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Tuning | If there are sufficient CPUs, you can increase the number of vCPUs to improve performance. For example, if the value of this parameter is changed from 2 to 4, the CSV reading performance can be doubled. | + +----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _mrs_01_1421__en-us_topic_0000001173949286_tcac192b5a3174a15b095684ff1ed0f80: + +.. table:: **Table 4** Whether to use Yarn local directories for multi-disk data loading + + +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | carbon.use.local.dir | + +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration File | carbon.properties | + +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Function | Data loading | + +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario Description | Whether to use Yarn local directories for multi-disk data loading | + +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Tuning | If this parameter is set to **true**, CarbonData uses local Yarn directories for multi-table load disk load balance, improving data loading performance. | + +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _mrs_01_1421__en-us_topic_0000001173949286_tc570296297a34147bc0c5800bff5ef56: + +.. table:: **Table 5** Whether to use multiple directories during loading + + +----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | carbon.use.multiple.temp.dir | + +----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration File | carbon.properties | + +----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Function | Data loading | + +----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario Description | Whether to use multiple temporary directories to store temporary sort files | + +----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Tuning | If this parameter is set to **true**, multiple temporary directories are used to store temporary sort files during data loading. This configuration improves data loading performance and prevents single points of failure (SPOFs) on disks. | + +----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +:ref:`Table 6 ` describes the configurations for CarbonData data loading and query. + +.. _mrs_01_1421__en-us_topic_0000001173949286_taf36a94822c9418ebec5d418fa2cce2e: + +.. table:: **Table 6** Number of vCPUs used for data loading and query + + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | carbon.compaction.level.threshold | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration File | carbon.properties | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Function | Data loading and query | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario Description | For minor compaction, specifies the number of segments to be merged in stage 1 and number of compacted segments to be merged in stage 2. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Tuning | Each CarbonData load will create one segment, if every load is small in size, it will generate many small files over a period of time impacting the query performance. Configuring this parameter will merge the small segments to one big segment which will sort the data and improve the performance. | + | | | + | | The compaction policy depends on the actual data size and available resources. For example, a bank loads data once a day and at night when no query is performed. If resources are sufficient, the compaction policy can be 6 or 5. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. table:: **Table 7** Whether to enable data pre-loading when the index cache server is used + + +----------------------+--------------------------------------------------------------------------------------------------------------------+ + | Parameter | carbon.indexserver.enable.prepriming | + +----------------------+--------------------------------------------------------------------------------------------------------------------+ + | Configuration File | carbon.properties | + +----------------------+--------------------------------------------------------------------------------------------------------------------+ + | Function | Data loading | + +----------------------+--------------------------------------------------------------------------------------------------------------------+ + | Scenario Description | Enabling data pre-loading during the use of the index cache server can improve the performance of the first query. | + +----------------------+--------------------------------------------------------------------------------------------------------------------+ + | Tuning | You can set this parameter to **true** to enable the pre-loading function. The default value is **false**. | + +----------------------+--------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/index.rst new file mode 100644 index 0000000..3eda8a2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_1417.html + +.. _mrs_01_1417: + +CarbonData Performance Tuning +============================= + +- :ref:`Tuning Guidelines ` +- :ref:`Suggestions for Creating CarbonData Tables ` +- :ref:`Configurations for Performance Tuning ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + tuning_guidelines + suggestions_for_creating_carbondata_tables + configurations_for_performance_tuning diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/suggestions_for_creating_carbondata_tables.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/suggestions_for_creating_carbondata_tables.rst new file mode 100644 index 0000000..65c7fac --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/suggestions_for_creating_carbondata_tables.rst @@ -0,0 +1,129 @@ +:original_name: mrs_01_1419.html + +.. _mrs_01_1419: + +Suggestions for Creating CarbonData Tables +========================================== + +Scenario +-------- + +This section provides suggestions based on more than 50 test cases to help you create CarbonData tables with higher query performance. + +.. table:: **Table 1** Columns in the CarbonData table + + =========== ============= =========== =========== + Column name Data type Cardinality Attribution + =========== ============= =========== =========== + msisdn String 30 million dimension + BEGIN_TIME bigint 10,000 dimension + host String 1 million dimension + dime_1 String 1,000 dimension + dime_2 String 500 dimension + dime_3 String 800 dimension + counter_1 numeric(20,0) NA measure + ... ... NA measure + counter_100 numeric(20,0) NA measure + =========== ============= =========== =========== + +Procedure +--------- + +- If the to-be-created table contains a column that is frequently used for filtering, for example, this column is used in more than 80% of filtering scenarios, + + implement optimization as follows: + + Place this column in the first column of **sort_columns**. + + For example, if **msisdn** is the most frequently used filter criterion in a query, it is placed in the first column. Run the following command to create a table. The query performance is good if **msisdn** is used as the filter condition. + + .. code-block:: + + create table carbondata_table( + msisdn String, + ... + )STORED AS carbondata TBLPROPERTIES ('SORT_COLUMS'='msisdn'); + +- If the to-be-created table has multiple columns which are frequently used to filter the results, + + implement optimization as follows: + + Create an index for the columns. + + For example, if **msisdn**, **host**, and **dime_1** are frequently used columns, the **sort_columns** column sequence is "dime_1-> host-> msisdn..." based on cardinality. Run the following command to create a table. The following command can improve the filtering performance of **dime_1**, **host**, and **msisdn**. + + .. code-block:: + + create table carbondata_table( + dime_1 String, + host String, + msisdn String, + dime_2 String, + dime_3 String, + ... + )STORED AS carbondata + TBLPROPERTIES ('SORT_COLUMS'='dime_1,host,msisdn'); + +- If the frequency of each column used for filtering is similar, + + implement optimization as follows: + + **sort_columns** is sorted in ascending order of cardinality. + + Run the following command to create a table: + + .. code-block:: + + create table carbondata_table( + Dime_1 String, + BEGIN_TIME bigint, + HOST String, + MSISDN String, + ... + )STORED AS carbondata + TBLPROPERTIES ('SORT_COLUMS'='dime_2,dime_3,dime_1, BEGIN_TIME,host,msisdn'); + +- Create tables in ascending order of cardinalities. Then create secondary indexes for columns with more cardinalities. The statement for creating an index is as follows: + + .. code-block:: + + create index carbondata_table_index_msidn on tablecarbondata_table ( + MSISDN String) as 'carbondata' PROPERTIES ('table_blocksize'='128'); + create index carbondata_table_index_host on tablecarbondata_table ( + host String) as 'carbondata' PROPERTIES ('table_blocksize'='128'); + +- For columns of measure type, not requiring high accuracy, the numeric (20,0) data type is not required. You are advised to use the double data type to replace the numeric (20,0) data type to enhance query performance. + + The result of performance analysis of test-case shows reduction in query execution time from 15 to 3 seconds, thereby improving performance by nearly 5 times. The command for creating a table is as follows: + + .. code-block:: + + create table carbondata_table( + Dime_1 String, + BEGIN_TIME bigint, + HOST String, + MSISDN String, + counter_1 double, + counter_2 double, + ... + counter_100 double, + )STORED AS carbondata + ; + +- If values (**start_time** for example) of a column are incremental: + + For example, if data is loaded to CarbonData every day, **start_time** is incremental for each load. In this case, it is recommended that the **start_time** column be put at the end of **sort_columns**, because incremental values are efficient in using min/max index. The command for creating a table is as follows: + + .. code-block:: + + create table carbondata_table( + Dime_1 String, + HOST String, + MSISDN String, + counter_1 double, + counter_2 double, + BEGIN_TIME bigint, + ... + counter_100 double, + )STORED AS carbondata + TBLPROPERTIES ( 'SORT_COLUMS'='dime_2,dime_3,dime_1..BEGIN_TIME'); diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/tuning_guidelines.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/tuning_guidelines.rst new file mode 100644 index 0000000..0a50340 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_performance_tuning/tuning_guidelines.rst @@ -0,0 +1,79 @@ +:original_name: mrs_01_1418.html + +.. _mrs_01_1418: + +Tuning Guidelines +================= + +Query Performance Tuning +------------------------ + +There are various parameters that can be tuned to improve the query performance in CarbonData. Most of the parameters focus on increasing the parallelism in processing and optimizing system resource usage. + +- Spark executor count: Executors are basic entities of parallelism in Spark. Raising the number of executors can increase the amount of parallelism in the cluster. For details about how to configure the number of executors, see the Spark documentation. +- Executor core: The number of concurrent tasks that an executor can run are controlled in each executor. Increasing the number of executor cores will add more concurrent processing tasks to improve performance. +- HDFS block size: CarbonData assigns query tasks by allocating different blocks to different executors for processing. HDFS block is the partition unit. CarbonData maintains a global block level index in Spark driver, which helps to reduce the quantity of blocks that need to be scanned for a query. Higher block size means higher I/O efficiency and lower global index efficiency. Reversely, lower block size means lower I/O efficiency, higher global index efficiency, and greater memory consumption. +- Number of scanner threads: Scanner threads control the number of parallel data blocks that are processed by each task. By increasing the number of scanner threads, you can increase the number of data blocks that are processed in parallel to improve performance. The **carbon.number.of.cores** parameter in the **carbon.properties** file is used to configure the number of scanner threads. For example, **carbon.number.of.cores = 4**. +- B-Tree caching: The cache memory can be optimized using the B-Tree least recently used (LRU) caching. In the driver, the B-Tree LRU caching configuration helps free up the cache by releasing table segments which are not accessed or not used. Similarly, in the executor, the B-Tree LRU caching configuration will help release table blocks that are not accessed or used. For details, see the description of **carbon.max.driver.lru.cache.size** and **carbon.max.executor.lru.cache.size** in :ref:`Table 2 `. + +CarbonData Query Process +------------------------ + +When CarbonData receives a table query task, for example query for table A, the index data of table A will be loaded to the memory for the query process. When CarbonData receives a query task for table A again, the system does not need to load the index data of table A. + +When a query is performed in CarbonData, the query task is divided into several scan tasks, namely, task splitting based on HDFS blocks. Scan tasks are executed by executors on the cluster. Tasks can run in parallel, partially parallel, or in sequence, depending on the number of executors and configured number of executor cores. + +Some parts of a query task can be processed at the individual task level, such as **select** and **filter.** Some parts of a query task can be processed at the individual task level, such as **group-by**, **count**, and **distinct count**. + +Some operations cannot be performed at the task level, such as **Having Clause** (filter after grouping) and **sort**. Operations which cannot be performed at the task level or can be only performed partially at the task level require data (partial results) transmission across executors on the cluster. The transmission operation is called shuffle. + +The more the tasks are, the more data needs to be shuffled. This affects query performance. + +The number of tasks is depending on the number of HDFS blocks and the number of blocks is depending on the size of each block. You are advised to configure proper HDFS block size to achieve a balance among increased parallelism, the amount of data to be shuffled, and the size of aggregate tables. + +Relationship Between Splits and Executors +----------------------------------------- + +If the number of splits is less than or equal to the executor count multiplied by the executor core count, the tasks are run in parallel. Otherwise, some tasks can start only after other tasks are complete. Therefore, ensure that the executor count multiplied by executor cores is greater than or equal to the number of splits. In addition, make sure that there are sufficient splits so that a query task can be divided into sufficient subtasks to ensure concurrency. + +Configuring Scanner Threads +--------------------------- + +The scanner threads property decides the number of data blocks to be processed. If there are too many data blocks, a large number of small data blocks will be generated, affecting performance. If there are few data blocks, the parallelism is poor and the performance is affected. Therefore, when determining the number of scanner threads, you are advised to consider the average data size within a partition and select a value that makes the data block not small. Based on experience, you are advised to divide a single block size (unit: MB) by 250 and use the result as the number of scanner threads. + +The number of actual available vCPUs is an important factor to consider when you want to increase the parallelism. The number of vCPUs that conduct parallel computation must not exceed 75% to 80% of actual vCPUs. + +The number of vCPUs is approximately equal to: + +Number of parallel tasks x Number of scanner threads. Number of parallel tasks is the smaller value of number of splits or executor count x executor cores. + +Data Loading Performance Tuning +------------------------------- + +Tuning of data loading performance is different from that of query performance. Similar to query performance, data loading performance depends on the amount of parallelism that can be achieved. In case of data loading, the number of worker threads decides the unit of parallelism. Therefore, more executors mean more executor cores and better data loading performance. + +To achieve better performance, you can configure the following parameters in HDFS. + +.. table:: **Table 1** HDFS configuration + + ===================================== ================= + Parameter Recommended Value + ===================================== ================= + dfs.datanode.drop.cache.behind.reads false + dfs.datanode.drop.cache.behind.writes false + dfs.datanode.sync.behind.writes true + ===================================== ================= + +Compression Tuning +------------------ + +CarbonData uses a few lightweight compression and heavyweight compression algorithms to compress data. Although these algorithms can process any type of data, the compression performance is better if the data is ordered with similar values being together. + +During data loading, data is sorted based on the order of columns in the table to achieve good compression performance. + +Since CarbonData sorts data in the order of columns defined in the table, the order of columns plays an important role in the effectiveness of compression. If the low cardinality dimension is on the left, the range of data partitions after sorting is small and the compression efficiency is high. If a high cardinality dimension is on the left, a range of data partitions obtained after sorting is relatively large, and compression efficiency is relatively low. + +Memory Tuning +------------- + +CarbonData provides a mechanism for memory tuning where data loading depends on the columns needed in the query. Whenever a query command is received, columns required by the query are fetched and data is loaded for those columns in memory. During this operation, if the memory threshold is reached, the least used loaded files are deleted to release memory space for columns required by the query. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/api.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/api.rst new file mode 100644 index 0000000..c49c912 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/api.rst @@ -0,0 +1,136 @@ +:original_name: mrs_01_1450.html + +.. _mrs_01_1450: + +API +=== + +This section describes the APIs and usage methods of Segment. All methods are in the org.apache.spark.util.CarbonSegmentUtil class. + +The following methods have been abandoned: + +.. code-block:: + + /** + * Returns the valid segments for the query based on the filter condition + * present in carbonScanRdd. + * + * @param carbonScanRdd + * @return Array of valid segments + */ + @deprecated def getFilteredSegments(carbonScanRdd: CarbonScanRDD[InternalRow]): Array[String]; + +Usage Method +------------ + +Use the following methods to obtain CarbonScanRDD from the query statement: + +.. code-block:: + + val df=carbon.sql("select * from table where age='12'") + val myscan=df.queryExecution.sparkPlan.collect { + case scan: CarbonDataSourceScan if scan.rdd.isInstanceOf[CarbonScanRDD[InternalRow]] => scan.rdd + case scan: RowDataSourceScanExec if scan.rdd.isInstanceOf[CarbonScanRDD[InternalRow]] => scan.rdd + }.head + val carbonrdd=myscan.asInstanceOf[CarbonScanRDD[InternalRow]] + +Example: + +.. code-block:: + + CarbonSegmentUtil.getFilteredSegments(carbonrdd) + +The filtered segment can be obtained by importing SQL statements. + +.. code-block:: + + /** + * Returns an array of valid segment numbers based on the filter condition provided in the sql + * NOTE: This API is supported only for SELECT Sql (insert into,ctas,.., is not supported) + * + * @param sql + * @param sparkSession + * @return Array of valid segments + * @throws UnsupportedOperationException because Get Filter Segments API supports if and only + * if only one carbon main table is present in query. + */ + def getFilteredSegments(sql: String, sparkSession: SparkSession): Array[String]; + +Example: + +.. code-block:: + + CarbonSegmentUtil.getFilteredSegments("select * from table where age='12'", sparkSession) + +Import the database name and table name to obtain the list of segments to be merged. The obtained segments can be used as parameters of the getMergedLoadName function. + +.. code-block:: + + /** + * Identifies all segments which can be merged with MAJOR compaction type. + * NOTE: This result can be passed to getMergedLoadName API to get the merged load name. + * + * @param sparkSession + * @param tableName + * @param dbName + * @return list of LoadMetadataDetails + */ + def identifySegmentsToBeMerged(sparkSession: SparkSession, + tableName: String, + dbName: String) : util.List[LoadMetadataDetails]; + +Example: + +.. code-block:: + + CarbonSegmentUtil.identifySegmentsToBeMerged(sparkSession, "table_test","default") + +Import the database name, table name, and obtain all segments which can be merged with CUSTOM compaction type. The obtained segments can be transferred as the parameter of the getMergedLoadName function. + +.. code-block:: + + /** + * Identifies all segments which can be merged with CUSTOM compaction type. + * NOTE: This result can be passed to getMergedLoadName API to get the merged load name. + * + * @param sparkSession + * @param tableName + * @param dbName + * @param customSegments + * @return list of LoadMetadataDetails + * @throws UnsupportedOperationException if customSegments is null or empty. + * @throws MalformedCarbonCommandException if segment does not exist or is not valid + */ + def identifySegmentsToBeMergedCustom(sparkSession: SparkSession, + tableName: String, + dbName: String, + customSegments: util.List[String]): util.List[LoadMetadataDetails]; + +Example: + +.. code-block:: + + val customSegments = new util.ArrayList[String]() + customSegments.add("1") + customSegments.add("2") + CarbonSegmentUtil.identifySegmentsToBeMergedCustom(sparkSession, "table_test","default", customSegments) + +If a segment list is specified, the merged load name is returned. + +.. code-block:: + + /** + * Returns the Merged Load Name for given list of segments + * + * @param list of segments + * @return Merged Load Name + * @throws UnsupportedOperationException if list of segments is less than 1 + */ + def getMergedLoadName(list: util.List[LoadMetadataDetails]): String; + +Example: + +.. code-block:: + + val carbonTable = CarbonEnv.getCarbonTable(Option(databaseName), tableName)(sparkSession) + val loadMetadataDetails = SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) CarbonSegmentUtil.getMergedLoadName(loadMetadataDetails.toList.asJava) diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/add_columns.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/add_columns.rst new file mode 100644 index 0000000..9ef3450 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/add_columns.rst @@ -0,0 +1,54 @@ +:original_name: mrs_01_1431.html + +.. _mrs_01_1431: + +ADD COLUMNS +=========== + +Function +-------- + +This command is used to add a column to an existing table. + +Syntax +------ + +**ALTER TABLE** *[db_name.]table_name* **ADD COLUMNS** *(col_name data_type,...)* **TBLPROPERTIES**\ *(''COLUMNPROPERTIES.columnName.shared_column'='sharedFolder.sharedColumnName,...', 'DEFAULT.VALUE.COLUMN_NAME'='default_value')*; + +Parameter Description +--------------------- + +.. table:: **Table 1** ADD COLUMNS parameters + + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+===================================================================================================================================================================================+ + | db_name | Database name. If this parameter is not specified, the current database is selected. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | table_name | Table name. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | col_name data_type | Name of a comma-separated column with a data type. It consists of letters, digits, and underscores (_). | + | | | + | | .. note:: | + | | | + | | When creating a CarbonData table, do not name columns as tupleId, PositionId, and PositionReference because they will be used in UPDATE, DELETE, and secondary index commands. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Precautions +----------- + +- Only **shared_column** and **default_value** are read. If any other property name is specified, no error will be thrown and the property will be ignored. +- If no default value is specified, the default value of the new column is considered null. +- If filter is applied to the column, new columns will not be added during sort. New columns may affect query performance. + +Examples +-------- + +- **ALTER TABLE** *carbon* **ADD COLUMNS** *(a1 INT, b1 STRING)*; +- **ALTER TABLE** *carbon* **ADD COLUMNS** *(a1 INT, b1 STRING)* **TBLPROPERTIES**\ *('COLUMNPROPERTIES.b1.shared_column'='sharedFolder.b1')*; +- ALTER TABLE *carbon* **ADD COLUMNS** *(a1 INT, b1 STRING)* **TBLPROPERTIES**\ *('DEFAULT.VALUE.a1'='10')*; + +System Response +--------------- + +The newly added column can be displayed by running the **DESCRIBE** command. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/alter_table_compaction.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/alter_table_compaction.rst new file mode 100644 index 0000000..edf597e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/alter_table_compaction.rst @@ -0,0 +1,85 @@ +:original_name: mrs_01_1429.html + +.. _mrs_01_1429: + +ALTER TABLE COMPACTION +====================== + +Function +-------- + +The **ALTER TABLE COMPACTION** command is used to merge a specified number of segments into a single segment. This improves the query performance of a table. + +Syntax +------ + +**ALTER TABLE**\ *[db_name.]table_name COMPACT 'MINOR/MAJOR/SEGMENT_INDEX';* + +**ALTER TABLE**\ *[db_name.]table_name COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (id1, id2, ...);* + +Parameter Description +--------------------- + +.. table:: **Table 1** ALTER TABLE COMPACTION parameters + + +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===============+================================================================================================================================================================================================================================================================================================================================================+ + | db_name | Database name. If this parameter is not specified, the current database is selected. | + +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | table_name | Table name. | + +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | MINOR | Minor compaction. For details, see :ref:`Combining Segments `. | + +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | MAJOR | Major compaction. For details, see :ref:`Combining Segments `. | + +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | SEGMENT_INDEX | This configuration enables you to merge all the CarbonData index files (**.carbonindex**) inside a segment to a single CarbonData index merge file (**.carbonindexmerge**). This enhances the first query performance. For more information, see :ref:`Table 1 `. | + +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | CUSTOM | Custom compaction. For details, see :ref:`Combining Segments `. | + +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Precautions +----------- + +N/A + +Examples +-------- + +**ALTER TABLE ProductDatabase COMPACT 'MINOR';** + +**ALTER TABLE ProductDatabase COMPACT 'MAJOR';** + +**ALTER TABLE ProductDatabase COMPACT 'SEGMENT_INDEX';** + +**ALTER TABLE ProductDatabase COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (0, 1);** + +System Response +--------------- + +**ALTER TABLE COMPACTION** does not show the response of the compaction because it is run in the background. + +If you want to view the response of minor and major compactions, you can check the logs or run the **SHOW SEGMENTS** command. + +Example: + +.. code-block:: + + +------+------------+--------------------------+------------------+------------+------------+-------------+--------------+--+ + | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | Index Size | File Format | + +------+------------+--------------------------+------------------+------------+------------+-------------+--------------+--+ + | 3 | Success | 2020-09-28 22:53:26.336 | 3.726S | {} | 6.47KB | 3.30KB | columnar_v3 | + | 2 | Success | 2020-09-28 22:53:01.702 | 6.688S | {} | 6.47KB | 3.30KB | columnar_v3 | + | 1 | Compacted | 2020-09-28 22:51:15.242 | 5.82S | {} | 6.50KB | 3.43KB | columnar_v3 | + | 0.1 | Success | 2020-10-30 20:49:24.561 | 16.66S | {} | 12.87KB | 6.91KB | columnar_v3 | + | 0 | Compacted | 2020-09-28 22:51:02.6 | 6.819S | {} | 6.50KB | 3.43KB | columnar_v3 | + +------+------------+--------------------------+------------------+------------+------------+-------------+--------------+--+ + +In the preceding information: + +- **Compacted** indicates that data has been compacted. +- **0.1** indicates the compacting result of segment 0 and segment 1. + +The compact operation does not incur any change to other operations. + +Compacted segments, such as segment 0 and segment 1, become useless. To save space, before you perform other operations, run the **CLEAN FILES** command to delete compacted segments. For more information about the **CLEAN FILES** command, see :ref:`CLEAN FILES `. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/change_data_type.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/change_data_type.rst new file mode 100644 index 0000000..f7e4544 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/change_data_type.rst @@ -0,0 +1,61 @@ +:original_name: mrs_01_1433.html + +.. _mrs_01_1433: + +CHANGE DATA TYPE +================ + +Function +-------- + +This command is used to change the data type from INT to BIGINT or decimal precision from lower to higher. + +Syntax +------ + +**ALTER TABLE** *[db_name.]table_name* **CHANGE** *col_name col_name changed_column_type*; + +Parameter Description +--------------------- + +.. table:: **Table 1** CHANGE DATA TYPE parameters + + +---------------------+------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +=====================+================================================================================================+ + | db_name | Name of the database. If this parameter is left unspecified, the current database is selected. | + +---------------------+------------------------------------------------------------------------------------------------+ + | table_name | Name of the table. | + +---------------------+------------------------------------------------------------------------------------------------+ + | col_name | Name of columns in a table. Column names contain letters, digits, and underscores (_). | + +---------------------+------------------------------------------------------------------------------------------------+ + | changed_column_type | The change in the data type. | + +---------------------+------------------------------------------------------------------------------------------------+ + +Usage Guidelines +---------------- + +- Change of decimal data type from lower precision to higher precision will only be supported for cases where there is no data loss. + + Example: + + - **Invalid scenario** - Change of decimal precision from (10,2) to (10,5) is not valid as in this case only scale is increased but total number of digits remain the same. + - **Valid scenario** - Change of decimal precision from (10,2) to (12,3) is valid as the total number of digits are increased by 2 but scale is increased only by 1 which will not lead to any data loss. + +- The allowed range is 38,38 (precision, scale) and is a valid upper case scenario which is not resulting in data loss. + +Examples +-------- + +- Changing data type of column a1 from INT to BIGINT. + + **ALTER TABLE** *test_db.carbon* **CHANGE** *a1 a1 BIGINT*; + +- Changing decimal precision of column a1 from 10 to 18. + + **ALTER TABLE** *test_db.carbon* **CHANGE** *a1 a1 DECIMAL(18,2)*; + +System Response +--------------- + +By running DESCRIBE command, the changed data type for the modified column is displayed. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/create_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/create_table.rst new file mode 100644 index 0000000..da65177 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/create_table.rst @@ -0,0 +1,152 @@ +:original_name: mrs_01_1425.html + +.. _mrs_01_1425: + +CREATE TABLE +============ + +Function +-------- + +This command is used to create a CarbonData table by specifying the list of fields along with the table properties. + +Syntax +------ + +**CREATE TABLE** *[IF NOT EXISTS] [db_name.]table_name* + +*[(col_name data_type, ...)]* + +**STORED AS** *carbondata* + +*[TBLPROPERTIES (property_name=property_value, ...)];* + +Additional attributes of all tables are defined in **TBLPROPERTIES**. + +Parameter Description +--------------------- + +.. table:: **Table 1** CREATE TABLE parameters + + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+==============================================================================================================================================================================================+ + | db_name | Database name that contains letters, digits, and underscores (_). | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | col_name data_type | List with data types separated by commas (,). The column name contains letters, digits, and underscores (_). | + | | | + | | .. note:: | + | | | + | | When creating a CarbonData table, do not use tupleId, PositionId, and PositionReference as column names because columns with these names are internally used by secondary index commands. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | table_name | Table name of a database that contains letters, digits, and underscores (_). | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | STORED AS | The **carbondata** parameter defines and creates a CarbonData table. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | TBLPROPERTIES | List of CarbonData table properties. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _mrs_01_1425__en-us_topic_0000001219230619_s4539dafd333c46ae855caaa175609f60: + +Precautions +----------- + +Table attributes are used as follows: + +- .. _mrs_01_1425__en-us_topic_0000001219230619_l053c6fa1a366488ea6410cb4bb4fc5d1: + + Block size + + The block size of a data file can be defined for a single table using **TBLPROPERTIES**. The larger one between the actual size of the data file and the defined block size is selected as the actual block size of the data file in HDFS. The unit is MB. The default value is 1024 MB. The value ranges from 1 MB to 2048 MB. If the value is beyond the range, the system reports an error. + + Once the block size reaches the configured value, the write program starts a new block of CarbonData data. Data is written in multiples of the page size (32,000 records). Therefore, the boundary is not strict at the byte level. If the new page crosses the boundary of the configured block, the page is written to the new block instead of the current block. + + *TBLPROPERTIES('table_blocksize'='128')* + + .. note:: + + - If a small block size is configured in the CarbonData table while the size of the data file generated by the loaded data is large, the block size displayed in HDFS is different from the configured value. This is because when data is written to a local block file for the first time, even though the size of the to-be-written data is larger than the configured value of the block size, data will still be written into the block. Therefore, the actual value of block size in HDFS is the larger value between the size of the data to be written and the configured block size. + - If **block.num** is less than the parallelism, the blocks are split into new blocks so that new blocks.num is greater than parallelism and all cores can be used. This optimization is called block distribution. + +- **SORT_SCOPE** specifies the sort scope during table creation. There are four types of sort scopes: + + - **GLOBAL_SORT**: It improves query performance, especially for point queries. *TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT'*) + - **LOCAL_SORT**: Data is sorted locally (task-level sorting). + - **NO_SORT**: The default sorting mode is used. Data is loaded in unsorted manner, which greatly improves loading performance. + +- SORT_COLUMNS + + This table property specifies the order of sort columns. + + *TBLPROPERTIES('SORT_COLUMNS'='column1, column3')* + + .. note:: + + - If this attribute is not specified, no columns are sorted by default. + - If this property is specified but with empty argument, then the table will be loaded without sort. For example, *('SORT_COLUMNS'='')*. + - **SORT_COLUMNS** supports the string, date, timestamp, short, int, long, byte, and boolean data types. + +- RANGE_COLUMN + + This property is used to specify a column to partition the input data by range. Only one column can be configured. During data import, you can use **global_sort_partitions** or **scale_factor** to avoid generating small files. + + *TBLPROPERTIES('RANGE_COLUMN'='column1')* + +- LONG_STRING_COLUMNS + + The length of a common string cannot exceed 32,000 characters. To store a string of more than 32,000 characters, set **LONG_STRING_COLUMNS** to the target column. + + *TBLPROPERTIES('LONG_STRING_COLUMNS'='column1, column3')* + + .. note:: + + **LONG_STRING_COLUMNS** can be set only for columns of the STRING, CHAR, or VARCHAR type. + +Scenarios +--------- + +Creating a Table by Specifying Columns + +The **CREATE TABLE** command is the same as that of Hive DDL. The additional configurations of CarbonData are provided as table properties. + +**CREATE TABLE** *[IF NOT EXISTS] [db_name.]table_name* + +*[(col_name data_type , ...)]* + +STORED AS *carbondata* + +*[TBLPROPERTIES (property_name=property_value, ...)];* + +Examples +-------- + +**CREATE TABLE** *IF NOT EXISTS productdb.productSalesTable (* + +*productNumber Int,* + +*productName String,* + +*storeCity String,* + +*storeProvince String,* + +*productCategory String,* + +*productBatch String,* + +*saleQuantity Int,* + +*revenue Int)* + +*STORED AS carbondata* + +*TBLPROPERTIES (* + +*'table_blocksize'='128',* + +*'SORT_COLUMNS'='productBatch, productName')* + +System Response +--------------- + +A table will be created and the success message will be logged in system logs. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/create_table_as_select.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/create_table_as_select.rst new file mode 100644 index 0000000..0a1cab3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/create_table_as_select.rst @@ -0,0 +1,48 @@ +:original_name: mrs_01_1426.html + +.. _mrs_01_1426: + +CREATE TABLE As SELECT +====================== + +Function +-------- + +This command is used to create a CarbonData table by specifying the list of fields along with the table properties. + +Syntax +------ + +**CREATE TABLE**\ * [IF NOT EXISTS] [db_name.]table_name* **STORED AS carbondata** *[TBLPROPERTIES (key1=val1, key2=val2, ...)] AS select_statement;* + +Parameter Description +--------------------- + +.. table:: **Table 1** CREATE TABLE parameters + + +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===============+=========================================================================================================================================================+ + | db_name | Database name that contains letters, digits, and underscores (_). | + +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | table_name | Table name of a database that contains letters, digits, and underscores (_). | + +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | STORED AS | Used to store data in CarbonData format. | + +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | TBLPROPERTIES | List of CarbonData table properties. For details, see :ref:`Precautions `. | + +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Precautions +----------- + +N/A + +Examples +-------- + +**CREATE TABLE** ctas_select_parquet **STORED AS** carbondata as select \* from parquet_ctas_test; + +System Response +--------------- + +This example will create a Carbon table from any Parquet table and load all the records from the Parquet table. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/drop_columns.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/drop_columns.rst new file mode 100644 index 0000000..a60ae38 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/drop_columns.rst @@ -0,0 +1,58 @@ +:original_name: mrs_01_1432.html + +.. _mrs_01_1432: + +DROP COLUMNS +============ + +Function +-------- + +This command is used to delete one or more columns from a table. + +Syntax +------ + +**ALTER TABLE** *[db_name.]table_name* **DROP COLUMNS** *(col_name, ...)*; + +Parameter Description +--------------------- + +.. table:: **Table 1** DROP COLUMNS parameters + + +------------+-------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +============+===================================================================================================================+ + | db_name | Database name. If this parameter is not specified, the current database is selected. | + +------------+-------------------------------------------------------------------------------------------------------------------+ + | table_name | Table name. | + +------------+-------------------------------------------------------------------------------------------------------------------+ + | col_name | Name of a column in a table. Multiple columns are supported. It consists of letters, digits, and underscores (_). | + +------------+-------------------------------------------------------------------------------------------------------------------+ + +Precautions +----------- + +After a column is deleted, at least one key column must exist in the schema. Otherwise, an error message is displayed, and the column fails to be deleted. + +Examples +-------- + +Assume that the table contains four columns named a1, b1, c1, and d1. + +- Delete a column: + + **ALTER TABLE** *carbon* **DROP COLUMNS** *(b1)*; + + **ALTER TABLE** *test_db.carbon* **DROP COLUMNS** *(b1)*; + +- Delete multiple columns: + + **ALTER TABLE** *carbon* **DROP COLUMNS** *(b1,c1)*; + + **ALTER TABLE** *test_db.carbon* **DROP COLUMNS** *(b1,c1)*; + +System Response +--------------- + +If you run the **DESCRIBE** command, the deleted columns will not be displayed. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/drop_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/drop_table.rst new file mode 100644 index 0000000..8781e5a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/drop_table.rst @@ -0,0 +1,44 @@ +:original_name: mrs_01_1427.html + +.. _mrs_01_1427: + +DROP TABLE +========== + +Function +-------- + +This command is used to delete an existing table. + +Syntax +------ + +**DROP TABLE** *[IF EXISTS] [db_name.]table_name;* + +Parameter Description +--------------------- + +.. table:: **Table 1** DROP TABLE parameters + + +------------+--------------------------------------------------------------------------------------+ + | Parameter | Description | + +============+======================================================================================+ + | db_name | Database name. If this parameter is not specified, the current database is selected. | + +------------+--------------------------------------------------------------------------------------+ + | table_name | Name of the table to be deleted | + +------------+--------------------------------------------------------------------------------------+ + +Precautions +----------- + +In this command, **IF EXISTS** and **db_name** are optional. + +Example +------- + +**DROP TABLE IF EXISTS productDatabase.productSalesTable;** + +System Response +--------------- + +The table will be deleted. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/index.rst new file mode 100644 index 0000000..0df5712 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/index.rst @@ -0,0 +1,36 @@ +:original_name: mrs_01_1424.html + +.. _mrs_01_1424: + +DDL +=== + +- :ref:`CREATE TABLE ` +- :ref:`CREATE TABLE As SELECT ` +- :ref:`DROP TABLE ` +- :ref:`SHOW TABLES ` +- :ref:`ALTER TABLE COMPACTION ` +- :ref:`TABLE RENAME ` +- :ref:`ADD COLUMNS ` +- :ref:`DROP COLUMNS ` +- :ref:`CHANGE DATA TYPE ` +- :ref:`REFRESH TABLE ` +- :ref:`REGISTER INDEX TABLE ` +- :ref:`REFRESH INDEX ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + create_table + create_table_as_select + drop_table + show_tables + alter_table_compaction + table_rename + add_columns + drop_columns + change_data_type + refresh_table + register_index_table + refresh_index diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/refresh_index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/refresh_index.rst new file mode 100644 index 0000000..2d43932 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/refresh_index.rst @@ -0,0 +1,49 @@ +:original_name: mrs_01_1436.html + +.. _mrs_01_1436: + +REFRESH INDEX +============= + +Function +-------- + +This command is used to merge all segments for data files in the secondary index table. + +Syntax +------ + +- **REFRESH INDEX** *indextable_name* ON TABLE *maintable_name* + + This command is used to merge all segments for which data files are to be merged. + +- **REFRESH INDEX** *indextable_name* ON TABLE *maintable_name* WHERE SEGMENT.ID IN (0,1,2..N) + + This command is used to merge a batch of specified segments. + +Parameter Description +--------------------- + +.. table:: **Table 1** REFRESH INDEX parameters + + =============== ========================= + Parameter Description + =============== ========================= + indextable_name Name of an index table + maintable_name Name of the primary table + =============== ========================= + +Precautions +----------- + +To clear the data file of compacted segments, run the **CLEAN FILES** command on the secondary index table. + +Examples +-------- + +**REFRESH INDEX** *productNameIndexTable*; + +System Response +--------------- + +After this command is executed, the number of data files in the secondary index table will be reduced. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/refresh_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/refresh_table.rst new file mode 100644 index 0000000..e584df3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/refresh_table.rst @@ -0,0 +1,48 @@ +:original_name: mrs_01_1434.html + +.. _mrs_01_1434: + +REFRESH TABLE +============= + +Function +-------- + +This command is used to register Carbon table to Hive meta store catalogue from exisiting Carbon table data. + +Syntax +------ + +**REFRESH TABLE** *db_name.table_name*; + +Parameter Description +--------------------- + +.. table:: **Table 1** REFRESH TABLE parameters + + +------------+------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +============+================================================================================================+ + | db_name | Name of the database. If this parameter is left unspecified, the current database is selected. | + +------------+------------------------------------------------------------------------------------------------+ + | table_name | Name of the table. | + +------------+------------------------------------------------------------------------------------------------+ + +Usage Guidelines +---------------- + +- The new database name and the old database name should be same. +- Before executing this command the old table schema and data should be copied into the new database location. +- If the table is aggregate table, then all the aggregate tables should be copied to the new database location. +- For old store, the time zone of the source and destination cluster should be same. +- If old cluster used HIVE meta store to store schema, refresh will not work as schema file does not exist in file system. + +Examples +-------- + +**REFRESH TABLE** *dbcarbon*.\ *productSalesTable*; + +System Response +--------------- + +By running this command, the Carbon table will be registered to Hive meta store catalogue from exisiting Carbon table data. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/register_index_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/register_index_table.rst new file mode 100644 index 0000000..821b592 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/register_index_table.rst @@ -0,0 +1,46 @@ +:original_name: mrs_01_1435.html + +.. _mrs_01_1435: + +REGISTER INDEX TABLE +==================== + +Function +-------- + +This command is used to register an index table with the primary table. + +Syntax +------ + +**REGISTER INDEX TABLE** *indextable_name* ON *db_name.maintable_name*; + +Parameter Description +--------------------- + +.. table:: **Table 1** REFRESH INDEX TABLE parameters + + +-----------------+--------------------------------------------------------------------------------------+ + | Parameter | Description | + +=================+======================================================================================+ + | db_name | Database name. If this parameter is not specified, the current database is selected. | + +-----------------+--------------------------------------------------------------------------------------+ + | indextable_name | Index table name. | + +-----------------+--------------------------------------------------------------------------------------+ + | maintable_name | Primary table name. | + +-----------------+--------------------------------------------------------------------------------------+ + +Precautions +----------- + +Before running this command, run **REFRESH TABLE** to register the primary table and secondary index table with the Hive metastore. + +Examples +-------- + +**REGISTER INDEX TABLE** *productNameIndexTable* ON *productdb*.\ *productSalesTable*; + +System Response +--------------- + +By running this command, the index table will be registered to the primary table. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/show_tables.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/show_tables.rst new file mode 100644 index 0000000..1764b84 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/show_tables.rst @@ -0,0 +1,42 @@ +:original_name: mrs_01_1428.html + +.. _mrs_01_1428: + +SHOW TABLES +=========== + +Function +-------- + +**SHOW TABLES** command is used to list all tables in the current or a specific database. + +Syntax +------ + +**SHOW TABLES** *[IN db\_name];* + +Parameter Description +--------------------- + +.. table:: **Table 1** SHOW TABLE parameters + + +------------+---------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +============+===============================================================================================================+ + | IN db_name | Name of the database. This parameter is required only when tables of this specific database are to be listed. | + +------------+---------------------------------------------------------------------------------------------------------------+ + +Usage Guidelines +---------------- + +IN db_Name is optional. + +Examples +-------- + +**SHOW TABLES IN ProductDatabase;** + +System Response +--------------- + +All tables are listed. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/table_rename.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/table_rename.rst new file mode 100644 index 0000000..104990b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/ddl/table_rename.rst @@ -0,0 +1,49 @@ +:original_name: mrs_01_1430.html + +.. _mrs_01_1430: + +TABLE RENAME +============ + +Function +-------- + +This command is used to rename an existing table. + +Syntax +------ + +**ALTER TABLE** *[db_name.]table_name* **RENAME TO** *new_table_name*; + +Parameter Description +--------------------- + +.. table:: **Table 1** RENAME parameters + + +----------------+--------------------------------------------------------------------------------------+ + | Parameter | Description | + +================+======================================================================================+ + | db_name | Database name. If this parameter is not specified, the current database is selected. | + +----------------+--------------------------------------------------------------------------------------+ + | table_name | Current name of the existing table | + +----------------+--------------------------------------------------------------------------------------+ + | new_table_name | New name of the existing table | + +----------------+--------------------------------------------------------------------------------------+ + +Precautions +----------- + +- Parallel queries (using table names to obtain paths for reading CarbonData storage files) may fail during this operation. +- The secondary index table cannot be renamed. + +Example +------- + +**ALTER TABLE** *carbon* **RENAME TO** *carbondata*; + +**ALTER TABLE** *test_db.carbon* **RENAME TO** *test_db.carbondata*; + +System Response +--------------- + +The new table name will be displayed in the CarbonData folder. You can run **SHOW TABLES** to view the new table name. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/clean_files.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/clean_files.rst new file mode 100644 index 0000000..2d8349c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/clean_files.rst @@ -0,0 +1,48 @@ +:original_name: mrs_01_1448.html + +.. _mrs_01_1448: + +CLEAN FILES +=========== + +Function +-------- + +After the **DELETE SEGMENT** command is executed, the deleted segments are marked as the **delete** state. After the segments are merged, the status of the original segments changes to **compacted**. The data files of these segments are not physically deleted. If you want to forcibly delete these files, run the **CLEAN FILES** command. + +However, running this command may result in a query command execution failure. + +Syntax +------ + +**CLEAN FILES FOR TABLE**\ * [db_name.]table_name* ; + +Parameter Description +--------------------- + +.. table:: **Table 1** CLEAN FILES FOR TABLE parameters + + +------------+----------------------------------------------------------------------------------+ + | Parameter | Description | + +============+==================================================================================+ + | db_name | Database name. It consists of letters, digits, and underscores (_). | + +------------+----------------------------------------------------------------------------------+ + | table_name | Name of the database table. It consists of letters, digits, and underscores (_). | + +------------+----------------------------------------------------------------------------------+ + +Precautions +----------- + +None + +Examples +-------- + +**CLEAN FILES FOR TABLE** *CarbonDatabase.CarbonTable*; + +In this example, all the segments marked as **deleted** and **compacted** are physically deleted. + +System Response +--------------- + +Success or failure will be recorded in the driver logs. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/create_secondary_index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/create_secondary_index.rst new file mode 100644 index 0000000..b8566e2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/create_secondary_index.rst @@ -0,0 +1,60 @@ +:original_name: mrs_01_1445.html + +.. _mrs_01_1445: + +CREATE SECONDARY INDEX +====================== + +Function +-------- + +This command is used to create secondary indexes in the CarbonData tables. + +Syntax +------ + +**CREATE INDEX** *index_name* + +**ON TABLE** *[db_name.]table_name (col_name1, col_name2)* + +**AS** *'carbondata*' + +**PROPERTIES** *('table_blocksize'='256')*; + +Parameter Description +--------------------- + +.. table:: **Table 1** CREATE SECONDARY INDEX parameters + + +-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +=================+===============================================================================================================================================+ + | index_name | Index table name. It consists of letters, digits, and special characters (_). | + +-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | db_name | Database name. It consists of letters, digits, and special characters (_). | + +-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | table_name | Name of the database table. It consists of letters, digits, and special characters (_). | + +-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | col_name | Name of a column in a table. Multiple columns are supported. It consists of letters, digits, and special characters (_). | + +-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | table_blocksize | Block size of a data file. For details, see :ref:`•Block Size `. | + +-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + +Precautions +----------- + +**db_name** is optional. + +Examples +-------- + +- **CREATE INDEX productNameIndexTable on table productdb.productSalesTable (productName,city) as 'carbondata';** + + In this example, a secondary table named **productdb.productNameIndexTable** is created and index information of the provided column is loaded. + +- **CREATE INDEX t1_index1 on table t1 (c7_Datatype_Desc) AS 'carbondata' PROPERTIES('table_blocksize'='256');** + +System Response +--------------- + +A secondary index table will be created. Index information related to the provided column will be loaded into the secondary index table. The success message will be recorded in system logs. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/delete_records_from_carbon_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/delete_records_from_carbon_table.rst new file mode 100644 index 0000000..362a0d3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/delete_records_from_carbon_table.rst @@ -0,0 +1,66 @@ +:original_name: mrs_01_1440.html + +.. _mrs_01_1440: + +DELETE RECORDS from CARBON TABLE +================================ + +Function +-------- + +This command is used to delete records from a CarbonData table. + +Syntax +------ + +**DELETE FROM CARBON_TABLE [WHERE expression];** + +Parameter Description +--------------------- + +.. table:: **Table 1** DELETE RECORDS parameters + + +--------------+-------------------------------------------------------------------------+ + | Parameter | Description | + +==============+=========================================================================+ + | CARBON TABLE | Name of the CarbonData table in which the DELETE operation is performed | + +--------------+-------------------------------------------------------------------------+ + +Precautions +----------- + +- If a segment is deleted, all secondary indexes associated with the segment are deleted as well. + +- If the **carbon.input.segments** property has been set for the queried table, the DELETE operation fails. To solve this problem, run the following statement before the query: + + Syntax: + + **SET carbon.input.segments. .=*;** + +Examples +-------- + +- Example 1: + + **delete from columncarbonTable1 d where d.column1 = 'country';** + +- Example 2: + + **delete from dest where column1 IN ('country1', 'country2');** + +- Example 3: + + **delete from columncarbonTable1 where column1 IN (select column11 from sourceTable2);** + +- Example 4: + + **delete from columncarbonTable1 where column1 IN (select column11 from sourceTable2 where column1 = 'USA');** + +- Example 5: + + **delete from columncarbonTable1 where column2 >= 4;** + +System Response +--------------- + +Success or failure will be recorded in the driver log and on the client. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/delete_segment_by_date.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/delete_segment_by_date.rst new file mode 100644 index 0000000..e9068d7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/delete_segment_by_date.rst @@ -0,0 +1,48 @@ +:original_name: mrs_01_1443.html + +.. _mrs_01_1443: + +DELETE SEGMENT by DATE +====================== + +Function +-------- + +This command is used to delete segments by loading date. Segments created before a specific date will be deleted. + +Syntax +------ + +**DELETE FROM TABLE db_name.table_name WHERE SEGMENT.STARTTIME BEFORE date_value**; + +Parameter Description +--------------------- + +.. table:: **Table 1** DELETE SEGMENT by DATE parameters + + +------------+----------------------------------------------------------------------------------------------+ + | Parameter | Description | + +============+==============================================================================================+ + | db_name | Database name. If this parameter is not specified, the current database is used. | + +------------+----------------------------------------------------------------------------------------------+ + | table_name | Name of a table in the specified database | + +------------+----------------------------------------------------------------------------------------------+ + | date_value | Valid date when segments are started to be loaded. Segments before the date will be deleted. | + +------------+----------------------------------------------------------------------------------------------+ + +Precautions +----------- + +Segments cannot be deleted from the stream table. + +Example +------- + +**DELETE FROM TABLE db_name.table_name WHERE SEGMENT.STARTTIME BEFORE '2017-07-01 12:07:20'**; + +**STARTTIME** indicates the loading start time of different loads. + +System Response +--------------- + +Success or failure will be recorded in CarbonData logs. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/delete_segment_by_id.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/delete_segment_by_id.rst new file mode 100644 index 0000000..be4b52a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/delete_segment_by_id.rst @@ -0,0 +1,48 @@ +:original_name: mrs_01_1442.html + +.. _mrs_01_1442: + +DELETE SEGMENT by ID +==================== + +Function +-------- + +This command is used to delete segments by the ID. + +Syntax +------ + +**DELETE FROM TABLE db_name.table_name WHERE SEGMENT.ID IN (segment_id1,segment_id2)**; + +Parameter Description +--------------------- + +.. table:: **Table 1** DELETE SEGMENT parameters + + +------------+---------------------------------------------------------------------------------+ + | Parameter | Description | + +============+=================================================================================+ + | segment_id | ID of the segment to be deleted. | + +------------+---------------------------------------------------------------------------------+ + | db_name | Database name. If the parameter is not specified, the current database is used. | + +------------+---------------------------------------------------------------------------------+ + | table_name | The name of the table in a specific database. | + +------------+---------------------------------------------------------------------------------+ + +Usage Guidelines +---------------- + +Segments cannot be deleted from the stream table. + +Examples +-------- + +**DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0)**; + +**DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0,5,8)**; + +System Response +--------------- + +Success or failure will be recorded in the CarbonData log. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/drop_secondary_index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/drop_secondary_index.rst new file mode 100644 index 0000000..4902f5a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/drop_secondary_index.rst @@ -0,0 +1,46 @@ +:original_name: mrs_01_1447.html + +.. _mrs_01_1447: + +DROP SECONDARY INDEX +==================== + +Function +-------- + +This command is used to delete the existing secondary index table in a specific table. + +Syntax +------ + +**DROP INDEX** *[IF EXISTS] index_name*\ ** ON** *[db_name.]table_name*; + +Parameter Description +--------------------- + +.. table:: **Table 1** DROP SECONDARY INDEX parameters + + +------------+----------------------------------------------------------------------------------------+ + | Parameter | Description | + +============+========================================================================================+ + | index_name | Name of the index table. Table name contains letters, digits, and underscores (_). | + +------------+----------------------------------------------------------------------------------------+ + | db_Name | Name of the database. If the parameter is not specified, the current database is used. | + +------------+----------------------------------------------------------------------------------------+ + | table_name | Name of the table to be deleted. | + +------------+----------------------------------------------------------------------------------------+ + +Usage Guidelines +---------------- + +In this command, **IF EXISTS** and **db_name** are optional. + +Examples +-------- + +**DROP INDEX** *if exists productNameIndexTable* **ON** *productdb.productSalesTable*; + +System Response +--------------- + +Secondary Index Table will be deleted. Index information will be cleared in CarbonData table and the success message will be recorded in system logs. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/index.rst new file mode 100644 index 0000000..48d3ba3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/index.rst @@ -0,0 +1,36 @@ +:original_name: mrs_01_1437.html + +.. _mrs_01_1437: + +DML +=== + +- :ref:`LOAD DATA ` +- :ref:`UPDATE CARBON TABLE ` +- :ref:`DELETE RECORDS from CARBON TABLE ` +- :ref:`INSERT INTO CARBON TABLE ` +- :ref:`DELETE SEGMENT by ID ` +- :ref:`DELETE SEGMENT by DATE ` +- :ref:`SHOW SEGMENTS ` +- :ref:`CREATE SECONDARY INDEX ` +- :ref:`SHOW SECONDARY INDEXES ` +- :ref:`DROP SECONDARY INDEX ` +- :ref:`CLEAN FILES ` +- :ref:`SET/RESET ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + load_data + update_carbon_table + delete_records_from_carbon_table + insert_into_carbon_table + delete_segment_by_id + delete_segment_by_date + show_segments + create_secondary_index + show_secondary_indexes + drop_secondary_index + clean_files + set_reset diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/insert_into_carbon_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/insert_into_carbon_table.rst new file mode 100644 index 0000000..1f80c6d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/insert_into_carbon_table.rst @@ -0,0 +1,64 @@ +:original_name: mrs_01_1441.html + +.. _mrs_01_1441: + +INSERT INTO CARBON TABLE +======================== + +Function +-------- + +This command is used to add the output of the SELECT command to a Carbon table. + +Syntax +------ + +**INSERT INTO [CARBON TABLE] [select query]**; + +Parameter Description +--------------------- + +.. table:: **Table 1** INSERT INTO parameters + + +--------------+---------------------------------------------------------------------------------------+ + | Parameter | Description | + +==============+=======================================================================================+ + | CARBON TABLE | Name of the CarbonData table to be inserted | + +--------------+---------------------------------------------------------------------------------------+ + | select query | SELECT query on the source table (CarbonData, Hive, and Parquet tables are supported) | + +--------------+---------------------------------------------------------------------------------------+ + +Precautions +----------- + +- A table has been created. + +- You must belong to the data loading group in order to perform data loading operations. By default, the data loading group is named **ficommon**. + +- CarbonData tables cannot be overwritten. + +- The data type of the source table and the target table must be the same. Otherwise, data in the source table will be regarded as bad records. + +- The **INSERT INTO** command does not support partial success. If bad records exist, the command fails. + +- When you insert data of the source table to the target table, you cannot upload or update data of the source table. + + To enable data loading or updating during the INSERT operation, set the following parameter to **true**. + + **carbon.insert.persist.enable**\ =\ **true** + + By default, the preceding parameters are set to **false**. + + .. note:: + + Enabling this property will reduce the performance of the INSERT operation. + +Example +------- + +**INSERT INTO CARBON select \* from TABLENAME**; + +System Response +--------------- + +Success or failure will be recorded in the driver logs. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/load_data.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/load_data.rst new file mode 100644 index 0000000..776e032 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/load_data.rst @@ -0,0 +1,228 @@ +:original_name: mrs_01_1438.html + +.. _mrs_01_1438: + +LOAD DATA +========= + +Function +-------- + +This command is used to load user data of a particular type, so that CarbonData can provide good query performance. + +.. note:: + + Only the raw data on HDFS can be loaded. + +Syntax +------ + +**LOAD DATA** *INPATH 'folder_path' INTO TABLE [db_name.]table_name OPTIONS(property_name=property_value, ...);* + +Parameter Description +--------------------- + +.. table:: **Table 1** LOAD DATA parameters + + +-------------+----------------------------------------------------------------------------------+ + | Parameter | Description | + +=============+==================================================================================+ + | folder_path | Path of the file or folder used for storing the raw CSV data. | + +-------------+----------------------------------------------------------------------------------+ + | db_name | Database name. If this parameter is not specified, the current database is used. | + +-------------+----------------------------------------------------------------------------------+ + | table_name | Name of a table in a database. | + +-------------+----------------------------------------------------------------------------------+ + +Precautions +----------- + +The following configuration items are involved during data loading: + +- **DELIMITER**: Delimiters and quote characters provided in the load command. The default value is a comma (**,**). + + *OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"')* + + You can use **'DELIMITER'='\\t'** to separate CSV data using tabs. + + OPTIONS('DELIMITER'='\\t') + + CarbonData also supports **\\001** and **\\017** as delimiters. + + .. note:: + + When the delimiter of CSV data is a single quotation mark ('), the single quotation mark must be enclosed in double quotation marks (" "). For example, 'DELIMITER'= "'". + +- **QUOTECHAR**: Delimiters and quote characters provided in the load command. The default value is double quotation marks (**"**). + + *OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"')* + +- **COMMENTCHAR**: Comment characters provided in the load command. During data loading, if there is a comment character at the beginning of a line, the line is regarded as a comment line and data in the line will not be loaded. The default value is a pound key (#). + + *OPTIONS('COMMENTCHAR'='#')* + +- **FILEHEADER**: If the source file does not contain any header, add a header to the **LOAD DATA** command. + + *OPTIONS('FILEHEADER'='column1,column2')* + +- **ESCAPECHAR**: Is used to perform strict verification of the escape character on CSV files. The default value is backslash (**\\**). + + OPTIONS('ESCAPECHAR'='\\') + + .. note:: + + Enter **ESCAPECHAR** in the CSV data. **ESCAPECHAR** must be enclosed in double quotation marks (" "). For example, "a\\b". + +- .. _mrs_01_1438__en-us_topic_0000001219149099_lcf623574402c443e908646591898c2be: + + Bad records handling: + + In order for the data processing application to provide benefits, certain data integration is required. In most cases, data quality problems are caused by data sources. + + Methods of handling bad records are as follows: + + - Load all of the data before dealing with the errors. + - Clean or delete bad records before loading data or stop the loading when bad records are found. + + There are many options for clearing source data during CarbonData data loading, as listed in :ref:`Table 2 `. + + .. _mrs_01_1438__en-us_topic_0000001219149099_t1d4d77614e2b4b92b0f334d52702013b: + + .. table:: **Table 2** Bad Records Logger + + +---------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration Item | Default Value | Description | + +===========================+=======================+======================================================================================================================================================================================================================================================+ + | BAD_RECORDS_LOGGER_ENABLE | false | Whether to create logs with details about bad records | + +---------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | BAD_RECORDS_ACTION | FAIL | The four types of actions for bad records are as follows: | + | | | | + | | | - **FORCE**: Auto-corrects the data by storing the bad records as NULL. | + | | | - **REDIRECT**: Bad records are written to the raw CSV instead of being loaded. | + | | | - **IGNORE**: Bad records are neither loaded nor written to the raw CSV. | + | | | - **FAIL**: Data loading fails if any bad records are found. | + | | | | + | | | .. note:: | + | | | | + | | | In loaded data, if all records are bad records, **BAD_RECORDS_ACTION** is invalid and the load operation fails. | + +---------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | IS_EMPTY_DATA_BAD_RECORD | false | Whether empty data of a column to be considered as bad record or not. If this parameter is set to **false**, empty data ("",', or,) is not considered as bad records. If this parameter is set to **true**, empty data is considered as bad records. | + +---------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | BAD_RECORD_PATH | ``-`` | HDFS path where bad records are stored. The default value is **Null**. If bad records logging or bad records operation redirection is enabled, the path must be configured by the user. | + +---------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + Example: + + **LOAD DATA INPATH** *'filepath.csv'* **INTO TABLE** *tablename* *OPTIONS('BAD_RECORDS_LOGGER_ENABLE'='true',* *'BAD_RECORD_PATH'='hdfs://hacluster/tmp/carbon', 'BAD_RECORDS_ACTION'='REDIRECT', 'IS_EMPTY_DATA_BAD_RECORD'='false');* + + .. note:: + + If **REDIRECT** is used, CarbonData will add all bad records into a separate CSV file. However, this file must not be used for subsequent data loading because the content may not exactly match the source record. You must clean up the source record for further data ingestion. This option is used to remind you which records are bad. + +- **MAXCOLUMNS**: (Optional) Specifies the maximum number of columns parsed by a CSV parser in a line. + + *OPTIONS('MAXCOLUMNS'='400')* + + .. table:: **Table 3** MAXCOLUMNS + + ============================== ============= ============= + Name of the Optional Parameter Default Value Maximum Value + ============================== ============= ============= + MAXCOLUMNS 2000 20000 + ============================== ============= ============= + + .. table:: **Table 4** Behavior chart of MAXCOLUMNS + + +-------------------------------+--------------------------------------+-----------------------------------------------------------------------------+ + | MAXCOLUMNS Value | Number of Columns in the File Header | Final Value Considered | + +===============================+======================================+=============================================================================+ + | Not specified in Load options | 5 | 2000 | + +-------------------------------+--------------------------------------+-----------------------------------------------------------------------------+ + | Not specified in Load options | 6000 | 6000 | + +-------------------------------+--------------------------------------+-----------------------------------------------------------------------------+ + | 40 | 7 | Max (column count of file header, MAXCOLUMNS value) | + +-------------------------------+--------------------------------------+-----------------------------------------------------------------------------+ + | 22000 | 40 | 20000 | + +-------------------------------+--------------------------------------+-----------------------------------------------------------------------------+ + | 60 | Not specified in Load options | Max (Number of columns in the first line of the CSV file, MAXCOLUMNS value) | + +-------------------------------+--------------------------------------+-----------------------------------------------------------------------------+ + + .. note:: + + There must be sufficient executor memory for setting the maximum value of **MAXCOLUMNS Option**. Otherwise, data loading will fail. + +- If **SORT_SCOPE** is set to **GLOBAL_SORT** during table creation, you can specify the number of partitions to be used when sorting data. If this parameter is not set or is set to a value less than **1**, the number of map tasks is used as the number of reduce tasks. It is recommended that each reduce task process 512 MB to 1 GB data. + + *OPTIONS('GLOBAL_SORT_PARTITIONS'='2')* + + .. note:: + + To increase the number of partitions, you may need to increase the value of **spark.driver.maxResultSize**, as the sampling data collected in the driver increases with the number of partitions. + +- **DATEFORMAT**: Specifies the date format of the table. + + *OPTIONS('DATEFORMAT'='dateFormat')* + + .. note:: + + Date formats are specified by date pattern strings. The date pattern letters in Carbon are same as in JAVA. + +- **TIMESTAMPFORMAT**: Specifies the timestamp of a table. +- *OPTIONS('TIMESTAMPFORMAT'='timestampFormat')* + +- **SKIP_EMPTY_LINE**: Ignores empty rows in the CSV file during data loading. + + *OPTIONS('SKIP_EMPTY_LINE'='TRUE/FALSE')* + +- **Optional:** **SCALE_FACTOR**: Used to control the number of partitions for **RANGE_COLUMN**, **SCALE_FACTOR**. The formula is as follows: + + .. code-block:: text + + splitSize = max(blocklet_size, (block_size - blocklet_size)) * scale_factor + numPartitions = total size of input data / splitSize + + The default value is **3**. The value ranges from **1** to **300**. + + *OPTIONS('SCALE_FACTOR'='10')* + + .. note:: + + - If **GLOBAL_SORT_PARTITIONS** and **SCALE_FACTOR** are used at the same time, only **GLOBAL_SORT_PARTITIONS** is valid. + - The compaction on **RANGE_COLUMN** will use **LOCAL_SORT** by default. + +Scenarios +--------- + +To load a CSV file to a CarbonData table, run the following statement: + +**LOAD DATA** *INPATH 'folder path' INTO TABLE tablename OPTIONS(property_name=property_value, ...);* + +Examples +-------- + +**LOAD DATA** *inpath 'hdfs://hacluster/src/test/resources/data.csv' INTO table carbontable* + +*options('DELIMITER'=',',* + +*'QUOTECHAR'='"',* + +*'COMMENTCHAR'='#',* + +*'ESCAPECHAR'*\ ='\\', + +*'FILEHEADER'='empno,empname,designation,doj,* + +*workgroupcategory,workgroupcategoryname,* + +*deptno,deptname,projectcode,projectjoindate,* + +*projectenddate,attendance,utilization,salary'*, + +*'DATEFORMAT' = 'yyyy-MM-dd'* + +*);* + +System Response +--------------- + +Success or failure will be recorded in the driver logs. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/set_reset.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/set_reset.rst new file mode 100644 index 0000000..cd4e607 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/set_reset.rst @@ -0,0 +1,143 @@ +:original_name: mrs_01_1449.html + +.. _mrs_01_1449: + +SET/RESET +========= + +Function +-------- + +This command is used to dynamically add, update, display, or reset the CarbonData properties without restarting the driver. + +Syntax +------ + +- Add or Update parameter value: + + **SET** *parameter_name*\ =\ *parameter_value* + + This command is used to add or update the value of **parameter_name**. + +- Display property value: + + **SET** *parameter_name* + + This command is used to display the value of **parameter_name**. + +- Display session parameter: + + **SET** + + This command is used to display all supported session parameters. + +- Display session parameters along with usage details: + + **SET** -v + + This command is used to display all supported session parameters and their usage details. + +- Reset parameter value: + + **RESET** + + This command is used to clear all session parameters. + +Parameter Description +--------------------- + +.. table:: **Table 1** SET parameters + + +-----------------+----------------------------------------------------------------------------------------+ + | Parameter | Description | + +=================+========================================================================================+ + | parameter_name | Name of the parameter whose value needs to be dynamically added, updated, or displayed | + +-----------------+----------------------------------------------------------------------------------------+ + | parameter_value | New value of **parameter_name** to be set | + +-----------------+----------------------------------------------------------------------------------------+ + +Precautions +----------- + +The following table lists the properties which you can set or clear using the SET or RESET command. + +.. table:: **Table 2** Properties + + +------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Property | Description | + +==========================================+================================================================================================================================================================================================================+ + | carbon.options.bad.records.logger.enable | Whether to enable bad record logger. | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.options.bad.records.action | Operations on bad records, for example, force, redirect, fail, or ignore. For more information, see :ref:`•Bad record handling `. | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.options.is.empty.data.bad.record | Whether the empty data is considered as a bad record. For more information, see :ref:`Bad record handling `. | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.options.sort.scope | Scope of the sort during data loading. | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.options.bad.record.path | HDFS path where bad records are stored. | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.custom.block.distribution | Whether to enable Spark or CarbonData block distribution. | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | enable.unsafe.sort | Whether to use unsafe sort during data loading. Unsafe sort reduces the garbage collection during data loading, thereby achieving better performance. | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.si.lookup.partialstring | If this is set to **TRUE**, the secondary index uses the starts-with, ends-with, contains, and LIKE partition condition strings. | + | | | + | | If this is set to **FALSE**, the secondary index uses only the starts-with partition condition string. | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.input.segments | Segment ID to be queried. This property allows you to query a specified segment of a specified table. CarbonScan reads data only from the specified segment ID. | + | | | + | | Syntax: | + | | | + | | **carbon.input.segments. . = < list of segment ids >** | + | | | + | | If you want to query a specified segment in multi-thread mode, you can use **CarbonSession.threadSet** instead of the **SET** statement. | + | | | + | | Syntax: | + | | | + | | **CarbonSession.threadSet ("carbon.input.segments. . ","< list of segment ids >");** | + | | | + | | .. note:: | + | | | + | | You are advised not to set this property in the **carbon.properties** file because all sessions contain the segment list unless session-level or thread-level overwriting occurs. | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Examples +-------- + +- Add or Update: + + **SET** *enable.unsafe.sort*\ =\ *true* + +- Display property value: + + **SET** *enable.unsafe.sort* + +- Show the segment ID list, segment status, and other required details, and specify the segment list to be read: + + **SHOW SEGMENTS FOR** *TABLE carbontable1;* + + **SET** *carbon.input.segments.db.carbontable1 = 1, 3, 9;* + +- Query a specified segment in multi-thread mode: + + **CarbonSession.threadSet** (*"carbon.input.segments.default.carbon_table_MulTI_THread", "1,3"*); + +- Use **CarbonSession.threadSet** to query segments in a multi-thread environment (Scala code is used as an example): + + .. code-block:: + + def main(args: Array[String]) { + Future { CarbonSession.threadSet("carbon.input.segments.default.carbon_table_MulTI_THread", "1") + spark.sql("select count(empno) from carbon_table_MulTI_THread").show() + } + } + +- Reset: + + **RESET** + +System Response +--------------- + +- Success will be recorded in the driver log. +- Failure will be displayed on the UI. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/show_secondary_indexes.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/show_secondary_indexes.rst new file mode 100644 index 0000000..16dcd0c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/show_secondary_indexes.rst @@ -0,0 +1,44 @@ +:original_name: mrs_01_1446.html + +.. _mrs_01_1446: + +SHOW SECONDARY INDEXES +====================== + +Function +-------- + +This command is used to list all secondary index tables in the CarbonData table. + +Syntax +------ + +**SHOW INDEXES ON db_name.table_name**; + +Parameter Description +--------------------- + +.. table:: **Table 1** SHOW SECONDARY INDEXES parameters + + +------------+-----------------------------------------------------------------------------------------+ + | Parameter | Description | + +============+=========================================================================================+ + | db_name | Database name. It consists of letters, digits, and special characters (_). | + +------------+-----------------------------------------------------------------------------------------+ + | table_name | Name of the database table. It consists of letters, digits, and special characters (_). | + +------------+-----------------------------------------------------------------------------------------+ + +Precautions +----------- + +**db_name** is optional. + +Examples +-------- + +**SHOW INDEXES ON productsales.product**; + +System Response +--------------- + +All index tables and corresponding index columns in a given CarbonData table will be listed. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/show_segments.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/show_segments.rst new file mode 100644 index 0000000..9fd40cf --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/show_segments.rst @@ -0,0 +1,53 @@ +:original_name: mrs_01_1444.html + +.. _mrs_01_1444: + +SHOW SEGMENTS +============= + +Function +-------- + +This command is used to list the segments of a CarbonData table. + +Syntax +------ + +**SHOW SEGMENTS FOR TABLE** *[db_name.]table_name* **LIMIT** *number_of_loads;* + +Parameter Description +--------------------- + +.. table:: **Table 1** SHOW SEGMENTS FOR TABLE parameters + + +-----------------+----------------------------------------------------------------------------------+ + | Parameter | Description | + +=================+==================================================================================+ + | db_name | Database name. If this parameter is not specified, the current database is used. | + +-----------------+----------------------------------------------------------------------------------+ + | table_name | Name of a table in the specified database | + +-----------------+----------------------------------------------------------------------------------+ + | number_of_loads | Threshold of records to be listed | + +-----------------+----------------------------------------------------------------------------------+ + +Precautions +----------- + +None + +Examples +-------- + +**SHOW SEGMENTS FOR TABLE** *CarbonDatabase.CarbonTable* **LIMIT** *2;* + +System Response +--------------- + +.. code-block:: + + +-----+----------+--------------------------+------------------+------------+------------+-------------+--------------+--+ + | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | Index Size | File Format | + +-----+----------+--------------------------+------------------+------------+------------+-------------+--------------+--+ + | 3 | Success | 2020-09-28 22:53:26.336 | 3.726S | {} | 6.47KB | 3.30KB | columnar_v3 | + | 2 | Success | 2020-09-28 22:53:01.702 | 6.688S | {} | 6.47KB | 3.30KB | columnar_v3 | + +-----+----------+--------------------------+------------------+------------+------------+-------------+--------------+--+ diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/update_carbon_table.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/update_carbon_table.rst new file mode 100644 index 0000000..77b813d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/dml/update_carbon_table.rst @@ -0,0 +1,94 @@ +:original_name: mrs_01_1439.html + +.. _mrs_01_1439: + +UPDATE CARBON TABLE +=================== + +Function +-------- + +This command is used to update the CarbonData table based on the column expression and optional filtering conditions. + +Syntax +------ + +- Syntax 1: + + **UPDATE SET (column_name1, column_name2, ... column_name n) = (column1_expression , column2_expression , column3_expression ... column n_expression ) [ WHERE { } ];** + +- Syntax 2: + + **UPDATE SET (column_name1, column_name2,) = (select sourceColumn1, sourceColumn2 from sourceTable [ WHERE { } ] ) [ WHERE { } ];** + +Parameter Description +--------------------- + +.. table:: **Table 1** UPDATE parameters + + +--------------+-------------------------------------------------------------------------------+ + | Parameter | Description | + +==============+===============================================================================+ + | CARBON TABLE | Name of the CarbonData table to be updated | + +--------------+-------------------------------------------------------------------------------+ + | column_name | Target column to be updated | + +--------------+-------------------------------------------------------------------------------+ + | sourceColumn | Column value of the source table that needs to be updated in the target table | + +--------------+-------------------------------------------------------------------------------+ + | sourceTable | Table from which the records are updated to the target table | + +--------------+-------------------------------------------------------------------------------+ + +Precautions +----------- + +Note the following before running this command: + +- The UPDATE command fails if multiple input rows in the source table are matched with a single row in the target table. + +- If the source table generates empty records, the UPDATE operation completes without updating the table. + +- If rows in the source table do not match any existing rows in the target table, the UPDATE operation completes without updating the table. + +- UPDATE is not allowed in the table with secondary index. + +- In a subquery, if the source table and target table are the same, the UPDATE operation fails. + +- The UPDATE operation fails if the subquery used in the UPDATE command contains an aggregate function or a GROUP BY clause. + + For example, **update t_carbn01 a set (a.item_type_code, a.profit) = ( select b.item_type_cd, sum(b.profit) from t_carbn01b b where item_type_cd =2 group by item_type_code);**. + + In the preceding example, aggregate function **sum(b.profit)** and GROUP BY clause are used in the subquery. As a result, the UPDATE operation will fail. + +- If the **carbon.input.segments** property has been set for the queried table, the UPDATE operation fails. To solve this problem, run the following statement before the query: + + Syntax: + + **SET carbon.input.segments. . =\***; + +Examples +-------- + +- Example 1: + + **update carbonTable1 d set (d.column3,d.column5 ) = (select s.c33 ,s.c55 from sourceTable1 s where d.column1 = s.c11) where d.column1 = 'country' exists( select \* from table3 o where o.c2 > 1);** + +- Example 2: + + **update carbonTable1 d set (c3) = (select s.c33 from sourceTable1 s where d.column1 = s.c11) where exists( select \* from iud.other o where o.c2 > 1);** + +- Example 3: + + **update carbonTable1 set (c2, c5 ) = (c2 + 1, concat(c5 , "y" ));** + +- Example 4: + + **update carbonTable1 d set (c2, c5 ) = (c2 + 1, "xyx") where d.column1 = 'india';** + +- Example 5: + + **update carbonTable1 d set (c2, c5 ) = (c2 + 1, "xyx") where d.column1 = 'india' and exists( select \* from table3 o where o.column2 > 1);** + +System Response +--------------- + +Success or failure will be recorded in the driver log and on the client. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/index.rst new file mode 100644 index 0000000..c588d29 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/index.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_1423.html + +.. _mrs_01_1423: + +CarbonData Syntax Reference +=========================== + +- :ref:`DDL ` +- :ref:`DML ` +- :ref:`Operation Concurrent Execution ` +- :ref:`API ` +- :ref:`Spatial Indexes ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + ddl/index + dml/index + operation_concurrent_execution + api + spatial_indexes diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/operation_concurrent_execution.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/operation_concurrent_execution.rst new file mode 100644 index 0000000..c033743 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/operation_concurrent_execution.rst @@ -0,0 +1,70 @@ +:original_name: mrs_01_24046.html + +.. _mrs_01_24046: + +Operation Concurrent Execution +============================== + +Before performing :ref:`DDL ` and :ref:`DML ` operations, you need to obtain the corresponding locks. See :ref:`Table 1 ` for details about the locks that need to be obtained for each operation. The check mark (Y) indicates that the lock is required. An operation can be performed only after all required locks are obtained. + +You can check whether any two operations can be executed concurrently by using the following method: The first two lines in :ref:`Table 1 ` indicate two operations. If no column in the two lines is marked with the check mark (Y), the two operations can be executed concurrently. That is, if the columns with check marks (Y) in the two lines do not exist, the two operations can be executed concurrently. + +.. _mrs_01_24046__en-us_topic_0000001173789226_table1548533815231: + +.. table:: **Table 1** List of obtaining locks for operations + + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | Operation | METADATA_LOCK | COMPACTION_LOCK | DROP_TABLE_LOCK | DELETE_SEGMENT_LOCK | CLEAN_FILES_LOCK | ALTER_PARTITION_LOCK | UPDATE_LOCK | STREAMING_LOCK | CONCURRENT_LOAD_LOCK | SEGMENT_LOCK | + +==================================+===============+=================+=================+=====================+==================+======================+=============+================+======================+==============+ + | CREATE TABLE | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | CREATE TABLE As SELECT | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | DROP TABLE | Y | ``-`` | Y | ``-`` | ``-`` | ``-`` | ``-`` | Y | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | ALTER TABLE COMPACTION | ``-`` | Y | ``-`` | ``-`` | ``-`` | ``-`` | Y | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | TABLE RENAME | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | ADD COLUMNS | Y | Y | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | DROP COLUMNS | Y | Y | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | CHANGE DATA TYPE | Y | Y | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | REFRESH TABLE | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | REGISTER INDEX TABLE | Y | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | REFRESH INDEX | ``-`` | Y | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | LOAD DATA/INSERT INTO | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | Y | Y | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | UPDATE CARBON TABLE | Y | Y | ``-`` | ``-`` | ``-`` | ``-`` | Y | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | DELETE RECORDS from CARBON TABLE | Y | Y | ``-`` | ``-`` | ``-`` | ``-`` | Y | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | DELETE SEGMENT by ID | ``-`` | ``-`` | ``-`` | Y | Y | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | DELETE SEGMENT by DATE | ``-`` | ``-`` | ``-`` | Y | Y | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | SHOW SEGMENTS | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | CREATE SECONDARY INDEX | Y | Y | ``-`` | Y | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | SHOW SECONDARY INDEXES | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | DROP SECONDARY INDEX | Y | ``-`` | Y | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | CLEAN FILES | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | SET/RESET | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | Add Hive Partition | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | Drop Hive Partition | Y | Y | Y | Y | Y | Y | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | Drop Partition | Y | Y | Y | Y | Y | Y | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ + | Alter table set | Y | Y | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | ``-`` | + +----------------------------------+---------------+-----------------+-----------------+---------------------+------------------+----------------------+-------------+----------------+----------------------+--------------+ diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/spatial_indexes.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/spatial_indexes.rst new file mode 100644 index 0000000..2364854 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_syntax_reference/spatial_indexes.rst @@ -0,0 +1,656 @@ +:original_name: mrs_01_1451.html + +.. _mrs_01_1451: + +Spatial Indexes +=============== + +Quick Example +------------- + +.. code-block:: + + create table IF NOT EXISTS carbonTable + ( + COLUMN1 BIGINT, + LONGITUDE BIGINT, + LATITUDE BIGINT, + COLUMN2 BIGINT, + COLUMN3 BIGINT + ) + STORED AS carbondata + TBLPROPERTIES ('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.850713','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.minLongitude'='115.828503','SPATIAL_INDEX.mygeohash.maxLongitude'='720.000000','SPATIAL_INDEX.mygeohash.minLatitude'='39.850713','SPATIAL_INDEX.mygeohash.maxLatitude'='720.000000','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='1000000','SORT_COLUMNS'='column1,column2,column3,latitude,longitude'); + +Introduction to Spatial Indexes +------------------------------- + +Spatial data includes multidimensional points, lines, rectangles, cubes, polygons, and other geometric objects. A spatial data object occupies a certain region of space, called spatial scope, characterized by its location and boundary. The spatial data can be either point data or region data. + +- Point data: A point has a spatial extent characterized completely by its location. It does not occupy space and has no associated boundary. Point data consists of a collection of points in a two-dimensional space. Points can be stored as a pair of longitude and latitude. +- Region data: A region has a spatial extent with a location, and boundary. The location can be considered as the position of a fixed point in the region, such as its centroid. In two dimensions, the boundary can be visualized as a line (for finite regions, a closed loop). Region data contains a collection of regions. + +Currently, only point data is supported, and it can be stored. + +Longitude and latitude can be encoded as a unique GeoID. Geohash is a public-domain geocoding system invented by Gustavo Niemeyer. It encodes geographical locations into a short string of letters and digits. It is a hierarchical spatial data structure which subdivides the space into buckets of grid shape, which is one of the many applications of what is known as the Z-order curve, and generally the space-filling curve. + +The Z value of a point in multiple dimensions is calculated by interleaving the binary representation of its coordinate value, as shown in the following figure. When Geohash is used to create a GeoID, data is sorted by GeoID instead of longitude and latitude. Data is stored by spatial proximity. + +|image1| + +Creating a Table +---------------- + +**GeoHash encoding**: + +.. code-block:: + + create table IF NOT EXISTS carbonTable + ( + ... + `LONGITUDE` BIGINT, + `LATITUDE` BIGINT, + ... + ) + STORED AS carbondata + TBLPROPERTIES ('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude','SPATIAL_INDEX.mygeohash.originLatitude'='xx.xxxxxx','SPATIAL_INDEX.mygeohash.gridSize'='xx','SPATIAL_INDEX.mygeohash.minLongitude'='xxx.xxxxxx','SPATIAL_INDEX.mygeohash.maxLongitude'='xxx.xxxxxx','SPATIAL_INDEX.mygeohash.minLatitude'='xx.xxxxxx','SPATIAL_INDEX.mygeohash.maxLatitude'='xxx.xxxxxx','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='1000000','SORT_COLUMNS'='column1,column2,column3,latitude,longitude'); + +**SPATIAL_INDEX** is a user-defined index handler. This handler allows users to create new columns from the table-structure column set. The new column name is the same as that of the handler name. The **type** and **sourcecolumns** properties of the handler are mandatory. Currently, the value of **type** supports only **geohash**. Carbon provides a default implementation class that can be easily used. You can extend the default implementation class to mount the customized implementation class of **geohash**. The default handler also needs to provide the following table properties: + +- **SPATIAL_INDEX.**\ *xxx*\ **.originLatitude**: specifies the origin latitude. (**Double** type.) +- **SPATIAL_INDEX.**\ *xxx*\ **.gridSize**: specifies the grid length in meters. (**Int** type.) +- **SPATIAL_INDEX.**\ *xxx*\ **.minLongitude**: specifies the minimum longitude. (**Double** type.) +- **SPATIAL_INDEX.**\ *xxx*\ **.maxLongitude**: specifies the maximum longitude. (**Double** type.) +- **SPATIAL_INDEX.**\ *xxx*\ **.minLatitude**: specifies the minimum latitude. (**Double** type.) +- **SPATIAL_INDEX.**\ *xxx*\ **.maxLatitude**: specifies the maximum latitude. (**Double** type.) +- **SPATIAL_INDEX.**\ *xxx*\ **.conversionRatio**: used to convert the small value of the longitude and latitude to an integer. (**Int** type.) + +You can add your own table properties to the handlers in the above format and access them in your custom implementation class. **originLatitude**, **gridSize**, and **conversionRatio** are mandatory. Other parameters are optional in Carbon. You can use the **SPATIAL_INDEX.**\ *xxx*\ **.class** property to specify their implementation classes. + +The default implementation class can generate handler column values for **sourcecolumns** in each row and support query based on the **sourcecolumns** filter criteria. The generated handler column is invisible to users. Except the **SORT_COLUMNS** table properties, no DDL commands or properties are allowed to contain the handler column. + +.. note:: + + - By default, the generated handler column is regarded as the sorting column. If **SORT_COLUMNS** does not contain any **sourcecolumns**, add the handler column to the end of the existing **SORT_COLUMNS**. If the handler column has been specified in **SORT_COLUMNS**, its order in **SORT_COLUMNS** remains unchanged. + - If **SORT_COLUMNS** contains any **sourcecolumns** but does not contain the handler column, the handler column is automatically inserted before **sourcecolumns** in **SORT_COLUMNS**. + - If **SORT_COLUMNS** needs to contain any **sourcecolumns**, ensure that the handler column is listed before the **sourcecolumns** so that the handler column can take effect during sorting. + +**GeoSOT encoding**: + +.. code-block:: + + CREATE TABLE carbontable( + ... + longitude DOUBLE, + latitude DOUBLE, + ...) + STORED AS carbondata + TBLPROPERTIES ('SPATIAL_INDEX'='xxx', + 'SPATIAL_INDEX.xxx.type'='geosot', + 'SPATIAL_INDEX.xxx.sourcecolumns'='longitude, latitude', + 'SPATIAL_INDEX.xxx.level'='21', + 'SPATIAL_INDEX.xxx.class'='org.apache.carbondata.geo.GeoSOTIndex') + +.. table:: **Table 1** Parameter description + + +---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +=================================+=========================================================================================================================================================================================+ + | SPATIAL_INDEX | Specifies the spatial index. Its value is the same as the column name. | + +---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | SPATIAL_INDEX.xxx.type | (Mandatory) The value is set to **geosot**. | + +---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | SPATIAL_INDEX.xxx.sourcecolumns | (Mandatory) Specifies the source columns for calculating the spatial index. The value must be two existing columns of the double type. | + +---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | SPATIAL_INDEX.xxx.level | (Optional) Specifies the columns for calculating the spatial index. The default value is **17**, through which you can obtain an accurate result and improve the computing performance. | + +---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | SPATIAL_INDEX.xxx.class | (Optional) Specifies the implementation class of GeoSOT. The default value is **org.apache.carbondata.geo.GeoSOTIndex**. | + +---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Example: + +.. code-block:: + + create table geosot( + timevalue bigint, + longitude double, + latitude double) + stored as carbondata + TBLPROPERTIES ('SPATIAL_INDEX'='mygeosot', + 'SPATIAL_INDEX.mygeosot.type'='geosot', + 'SPATIAL_INDEX.mygeosot.level'='21', 'SPATIAL_INDEX.mygeosot.sourcecolumns'='longitude, latitude'); + +.. _mrs_01_1451__en-us_topic_0000001173789446_section106234720257: + +Preparing Data +-------------- + +- Data file 1: **geosotdata.csv** + + .. code-block:: + + timevalue,longitude,latitude + 1575428400000,116.285807,40.084087 + 1575428400000,116.372142,40.129503 + 1575428400000,116.187332,39.979316 + 1575428400000,116.337069,39.951887 + 1575428400000,116.359102,40.154684 + 1575428400000,116.736367,39.970323 + 1575428400000,116.720179,40.009893 + 1575428400000,116.346961,40.13355 + 1575428400000,116.302895,39.930753 + 1575428400000,116.288955,39.999101 + 1575428400000,116.17609,40.129953 + 1575428400000,116.725575,39.981115 + 1575428400000,116.266922,40.179415 + 1575428400000,116.353706,40.156483 + 1575428400000,116.362699,39.942444 + 1575428400000,116.325378,39.963129 + +- Data file 2: **geosotdata2.csv** + + .. code-block:: + + timevalue,longitude,latitude + 1575428400000,120.17708,30.326882 + 1575428400000,120.180685,30.326327 + 1575428400000,120.184976,30.327105 + 1575428400000,120.189311,30.327549 + 1575428400000,120.19446,30.329698 + 1575428400000,120.186965,30.329133 + 1575428400000,120.177481,30.328911 + 1575428400000,120.169713,30.325614 + 1575428400000,120.164563,30.322243 + 1575428400000,120.171558,30.319613 + 1575428400000,120.176365,30.320687 + 1575428400000,120.179669,30.323688 + 1575428400000,120.181001,30.320761 + 1575428400000,120.187094,30.32354 + 1575428400000,120.193574,30.323651 + 1575428400000,120.186192,30.320132 + 1575428400000,120.190055,30.317464 + 1575428400000,120.195376,30.318094 + 1575428400000,120.160786,30.317094 + 1575428400000,120.168211,30.318057 + 1575428400000,120.173618,30.316612 + 1575428400000,120.181001,30.317316 + 1575428400000,120.185162,30.315908 + 1575428400000,120.192415,30.315871 + 1575428400000,120.161902,30.325614 + 1575428400000,120.164306,30.328096 + 1575428400000,120.197093,30.325985 + 1575428400000,120.19602,30.321651 + 1575428400000,120.198638,30.32354 + 1575428400000,120.165421,30.314834 + +Importing Data +-------------- + +The GeoHash default implementation class extends the customized index abstract class. If the handler property is not set to a customized implementation class, the default implementation class is used. You can extend the default implementation class to mount the customized implementation class of **geohash**. The methods of the customized index abstract class are as follows: + +- **Init** method: Used to extract, verify, and store the handler property. If the operation fails, the system throws an exception and displays the error information. +- **Generate** method: Used to generate indexes. It generates an index for each row of data. +- **Query** method: Used to generate an index value range list for given input. + +The commands for importing data are the same as those for importing common Carbon tables. + +**LOAD DATA inpath '/tmp/**\ *geosotdata.csv*\ **' INTO TABLE geosot OPTIONS ('DELIMITER'= ',');** + +**LOAD DATA inpath '/tmp/**\ *geosotdata2.csv*\ **' INTO TABLE geosot OPTIONS ('DELIMITER'= ',');** + +.. note:: + + For details about **geosotdata.csv** and **geosotdata2.csv**, see :ref:`Preparing Data `. + +Aggregate Query of Irregular Spatial Sets +----------------------------------------- + +**Query statements and filter UDFs** + +- Filtering data based on polygon + + **IN_POLYGON(pointList)** + + UDF input parameter + + +-----------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Type | Description | + +===========+========+================================================================================================================================================================================================================================================================================================+ + | pointList | String | Enter multiple points as a string. Each point is presented as **longitude latitude**. Longitude and latitude are separated by a space. Each pair of longitude and latitude is separated by a comma (,). The longitude and latitude values at the start and end of the string must be the same. | + +-----------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + UDF output parameter + + +-----------+---------+-----------------------------------------------------------+ + | Parameter | Type | Description | + +===========+=========+===========================================================+ + | inOrNot | Boolean | Checks whether data is in the specified **polygon_list**. | + +-----------+---------+-----------------------------------------------------------+ + + Example: + + .. code-block:: + + select longitude, latitude from geosot where IN_POLYGON('116.321011 40.123503, 116.137676 39.947911, 116.560993 39.935276, 116.321011 40.123503'); + +- Filtering data based on the polygon list + + **IN_POLYGON_LIST(polygonList, opType)** + + UDF input parameters + + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Type | Description | + +=======================+=======================+=========================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | polygonList | String | Inputs multiple polygons as a string. Each polygon is presented as **POLYGON ((longitude1 latitude1, longitude2 latitude2, …))**. Note that there is a space after **POLYGON**. Longitudes and latitudes are separated by spaces. Each pair of longitude and latitude is separated by a comma (,). The longitudes and latitudes at the start and end of a polygon must be the same. **IN_POLYGON_LIST** requires at least two polygons. | + | | | | + | | | Example: | + | | | | + | | | .. code-block:: | + | | | | + | | | POLYGON ((116.137676 40.163503, 116.137676 39.935276, 116.560993 39.935276, 116.137676 40.163503)) | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | opType | String | Performs union, intersection, and subtraction on multiple polygons. | + | | | | + | | | Currently, the following operation types are supported: | + | | | | + | | | - OR: A U B U C (Assume that three polygons A, B, and C are input.) | + | | | - AND: A ∩ B ∩ C | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + UDF output parameter + + +-----------+---------+-----------------------------------------------------------+ + | Parameter | Type | Description | + +===========+=========+===========================================================+ + | inOrNot | Boolean | Checks whether data is in the specified **polygon_list**. | + +-----------+---------+-----------------------------------------------------------+ + + Example: + + .. code-block:: + + select longitude, latitude from geosot where IN_POLYGON_LIST('POLYGON ((120.176433 30.327431,120.171283 30.322245,120.181411 30.314540, 120.190509 30.321653,120.185188 30.329358,120.176433 30.327431)), POLYGON ((120.191603 30.328946,120.184179 30.327465,120.181819 30.321464, 120.190359 30.315388,120.199242 30.324464,120.191603 30.328946))', 'OR'); + +- Filtering data based on the polyline list + + **IN_POLYLINE_LIST(polylineList, bufferInMeter)** + + UDF input parameters + + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Type | Description | + +=======================+=======================+==========================================================================================================================================================================================================================================================================================================+ + | polylineList | String | Inputs multiple polylines as a string. Each polyline is presented as **LINESTRING (longitude1 latitude1, longitude2 latitude2, …)**. Note that there is a space after **LINESTRING**. Longitudes and latitudes are separated by spaces. Each pair of longitude and latitude is separated by a comma (,). | + | | | | + | | | A union will be output based on the data in multiple polylines. | + | | | | + | | | Example: | + | | | | + | | | .. code-block:: | + | | | | + | | | LINESTRING (116.137676 40.163503, 116.137676 39.935276, 116.260993 39.935276) | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | bufferInMeter | Float | Polyline buffer distance, in meters. Right angles are used at the end to create a buffer. | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + UDF output parameter + + +-----------+---------+------------------------------------------------------------+ + | Parameter | Type | Description | + +===========+=========+============================================================+ + | inOrNot | Boolean | Checks whether data is in the specified **polyline_list**. | + +-----------+---------+------------------------------------------------------------+ + + Example: + + .. code-block:: + + select longitude, latitude from geosot where IN_POLYLINE_LIST('LINESTRING (120.184179 30.327465, 120.191603 30.328946, 120.199242 30.324464, 120.190359 30.315388)', 65); + +- Filtering data based on the GeoID range list + + **IN_POLYGON_RANGE_LIST(polygonRangeList, opType)** + + UDF input parameters + + +-----------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Type | Description | + +=======================+=======================+======================================================================================================================================================================================================================================================================================================+ + | polygonRangeList | String | Inputs multiple rangeLists as a string. Each rangeList is presented as **RANGELIST (startGeoId1 endGeoId1, startGeoId2 endGeoId2, …)**. Note that there is a space after **RANGELIST**. Start GeoIDs and end GeoIDs are separated by spaces. Each group of GeoID ranges is separated by a comma (,). | + | | | | + | | | Example: | + | | | | + | | | .. code-block:: | + | | | | + | | | RANGELIST (855279368848 855279368850, 855280799610 855280799612, 855282156300 855282157400) | + +-----------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | opType | String | Performs union, intersection, and subtraction on multiple rangeLists. | + | | | | + | | | Currently, the following operation types are supported: | + | | | | + | | | - OR: A U B U C (Assume that three rangeLists A, B, and C are input.) | + | | | - AND: A ∩ B ∩ C | + +-----------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + UDF output parameter + + +-----------+---------+-------------------------------------------------------------+ + | Parameter | Type | Description | + +===========+=========+=============================================================+ + | inOrNot | Boolean | Checks whether data is in the specified **polyRange_list**. | + +-----------+---------+-------------------------------------------------------------+ + + Example: + + .. code-block:: + + select mygeosot, longitude, latitude from geosot where IN_POLYGON_RANGE_LIST('RANGELIST (526549722865860608 526549722865860618, 532555655580483584 532555655580483594)', 'OR'); + +- Performing polygon query + + **IN_POLYGON_JOIN(GEO_HASH_INDEX_COLUMN, POLYGON_COLUMN)** + + Perform join query on two tables. One is a spatial data table containing the longitude, latitude, and GeoHashIndex columns, and the other is a dimension table that saves polygon data. + + During query, **IN_POLYGON_JOIN UDF**, **GEO_HASH_INDEX_COLUMN**, and **POLYGON_COLUMN** of the polygon table are used. **Polygon_column** specifies the column containing multiple points (longitude and latitude pairs). The first and last points in each row of the Polygon table must be the same. All points in each row form a closed geometric shape. + + UDF input parameters + + +-----------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Type | Description | + +=======================+========+=================================================================================================================================================================================+ + | GEO_HASH_INDEX_COLUMN | Long | GeoHashIndex column of the spatial data table. | + +-----------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | POLYGON_COLUMN | String | Polygon column of the polygon table, the value of which is represented by the string of polygon, for example, **POLYGON (( longitude1 latitude1, longitude2 latitude2, ...))**. | + +-----------------------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + Example: + + .. code-block:: + + CREATE TABLE polygonTable( + polygon string, + poiType string, + poiId String) + STORED AS carbondata; + + insert into polygonTable select 'POLYGON ((120.176433 30.327431,120.171283 30.322245, 120.181411 30.314540,120.190509 30.321653,120.185188 30.329358,120.176433 30.327431))','abc','1'; + + insert into polygonTable select 'POLYGON ((120.191603 30.328946,120.184179 30.327465, 120.181819 30.321464,120.190359 30.315388,120.199242 30.324464,120.191603 30.328946))','abc','2'; + + select t1.longitude,t1.latitude from geosot t1 + inner join + (select polygon,poiId from polygonTable where poitype='abc') t2 + on in_polygon_join(t1.mygeosot,t2.polygon) group by t1.longitude,t1.latitude; + +- Performing range_list query + + **IN_POLYGON_JOIN_RANGE_LIST(GEO_HASH_INDEX_COLUMN, POLYGON_COLUMN)** + + Use the **IN_POLYGON_JOIN_RANGE_LIST** UDF to associate the spatial data table with the polygon dimension table based on **Polygon_RangeList**. By using a range list, you can skip the conversion between a polygon and a range list. + + UDF input parameters + + +-----------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Type | Description | + +=======================+========+======================================================================================================================================================================================+ + | GEO_HASH_INDEX_COLUMN | Long | GeoHashIndex column of the spatial data table. | + +-----------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | POLYGON_COLUMN | String | Rangelist column of the Polygon table, the value of which is represented by the string of rangeList, for example, **RANGELIST (startGeoId1 endGeoId1, startGeoId2 endGeoId2, ...)**. | + +-----------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + Example: + + .. code-block:: + + CREATE TABLE polygonTable( + polygon string, + poiType string, + poiId String) + STORED AS carbondata; + + insert into polygonTable select 'RANGELIST (526546455897309184 526546455897309284, 526549831217315840 526549831217315850, 532555655580483534 532555655580483584)','xyz','2'; + + select t1.* + from geosot t1 + inner join + (select polygon,poiId from polygonTable where poitype='xyz') t2 + on in_polygon_join_range_list(t1.mygeosot,t2.polygon); + +**UDFs of spacial index tools** + +- Obtaining row number and column number of a grid converted from GeoID + + **GeoIdToGridXy(geoId)** + + UDF input parameter + + +-----------+------+-------------------------------------------------------------------------+ + | Parameter | Type | Description | + +===========+======+=========================================================================+ + | geoId | Long | Calculates the row number and column number of the grid based on GeoID. | + +-----------+------+-------------------------------------------------------------------------+ + + UDF output parameter + + +-----------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Type | Description | + +===========+============+==================================================================================================================================================================+ + | gridArray | Array[Int] | Returns the grid row and column numbers contained in GeoID in array. The first digit indicates the row number, and the second digit indicates the column number. | + +-----------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + Example: + + .. code-block:: + + select longitude, latitude, mygeohash, GeoIdToGridXy(mygeohash) as GridXY from geoTable; + +- Converting longitude and latitude to GeoID + + **LatLngToGeoId(latitude, longitude oriLatitude, gridSize)** + + UDF input parameters + + +-------------+--------+------------------------------------------------------------+ + | Parameter | Type | Description | + +=============+========+============================================================+ + | longitude | Long | Longitude. Note: The value is an integer after conversion. | + +-------------+--------+------------------------------------------------------------+ + | latitude | Long | Latitude. Note: The value is an integer after conversion. | + +-------------+--------+------------------------------------------------------------+ + | oriLatitude | Double | Origin latitude, required for calculating GeoID. | + +-------------+--------+------------------------------------------------------------+ + | gridSize | Int | Grid size, required for calculating GeoID. | + +-------------+--------+------------------------------------------------------------+ + + UDF output parameter + + +-----------+------+--------------------------------------------------------------------------+ + | Parameter | Type | Description | + +===========+======+==========================================================================+ + | geoId | Long | Returns a number that indicates the longitude and latitude after coding. | + +-----------+------+--------------------------------------------------------------------------+ + + Example: + + .. code-block:: + + select longitude, latitude, mygeohash, LatLngToGeoId(latitude, longitude, 39.832277, 50) as geoId from geoTable; + +- Converting GeoID to longitude and latitude + + **GeoIdToLatLng(geoId, oriLatitude, gridSize)** + + UDF input parameters + + +-------------+--------+-----------------------------------------------------------------------+ + | Parameter | Type | Description | + +=============+========+=======================================================================+ + | geoId | Long | Calculates the longitude and latitude based on GeoID. | + +-------------+--------+-----------------------------------------------------------------------+ + | oriLatitude | Double | Origin latitude, required for calculating the longitude and latitude. | + +-------------+--------+-----------------------------------------------------------------------+ + | gridSize | Int | Grid size, required for calculating the longitude and latitude. | + +-------------+--------+-----------------------------------------------------------------------+ + + .. note:: + + GeoID is generated based on the grid coordinates, which are the grid center. Therefore, the calculated longitude and latitude are the longitude and latitude of the grid center. There may be an error ranging from 0 degree to half of the grid size between the calculated longitude and latitude and the longitude and latitude of the generated GeoID. + + UDF output parameter + + +----------------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Type | Description | + +======================+===============+============================================================================================================================================================================================+ + | latitudeAndLongitude | Array[Double] | Returns the longitude and latitude coordinates of the grid center that represent the GeoID in array. The first digit indicates the latitude, and the second digit indicates the longitude. | + +----------------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + Example: + + .. code-block:: + + select longitude, latitude, mygeohash, GeoIdToLatLng(mygeohash, 39.832277, 50) as LatitudeAndLongitude from geoTable; + +- Calculating the upper-layer GeoID of the pyramid model + + **ToUpperLayerGeoId(geoId)** + + UDF input parameter + + +-----------+------+---------------------------------------------------------------------------------+ + | Parameter | Type | Description | + +===========+======+=================================================================================+ + | geoId | Long | Calculates the upper-layer GeoID of the pyramid model based on the input GeoID. | + +-----------+------+---------------------------------------------------------------------------------+ + + UDF output parameter + + ========= ==== =================================================== + Parameter Type Description + ========= ==== =================================================== + geoId Long Returns the upper-layer GeoID of the pyramid model. + ========= ==== =================================================== + + Example: + + .. code-block:: + + select longitude, latitude, mygeohash, ToUpperLayerGeoId(mygeohash) as upperLayerGeoId from geoTable; + +- Obtaining the GeoID range list using the input polygon + + **ToRangeList(polygon, oriLatitude, gridSize)** + + UDF input parameters + + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Type | Description | + +=======================+=======================+=====================================================================================================================================================================================+ + | polygon | String | Input polygon string, which is a pair of longitude and latitude. | + | | | | + | | | Longitude and latitude are separated by a space. Each pair of longitude and latitude is separated by a comma (,). The longitude and latitude at the start and end must be the same. | + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | oriLatitude | Double | Origin latitude, required for calculating GeoID. | + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | gridSize | Int | Grid size, required for calculating GeoID. | + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + UDF output parameter + + ========= =================== ========================================= + Parameter Type Description + ========= =================== ========================================= + geoIdList Buffer[Array[Long]] Converts polygons into GeoID range lists. + ========= =================== ========================================= + + Example: + + .. code-block:: + + select ToRangeList('116.321011 40.123503, 116.137676 39.947911, 116.560993 39.935276, 116.321011 40.123503', 39.832277, 50) as rangeList from geoTable; + +- Calculating the upper-layer longitude of the pyramid model + + **ToUpperLongitude (longitude, gridSize, oriLat)** + + UDF input parameters + + =========== ====== ==================================================== + Parameter Type Description + =========== ====== ==================================================== + longitude Long Input longitude, which is a long integer. + gridSize Int Grid size, required for calculating longitude. + oriLatitude Double Origin latitude, required for calculating longitude. + =========== ====== ==================================================== + + UDF output parameter + + ========= ==== ================================== + Parameter Type Description + ========= ==== ================================== + longitude Long Returns the upper-layer longitude. + ========= ==== ================================== + + Example: + + .. code-block:: + + select ToUpperLongitude (-23575161504L, 50, 39.832277) as upperLongitude from geoTable; + +- Calculating the upper-layer latitude of the pyramid model + + **ToUpperLatitude(Latitude, gridSize, oriLat)** + + UDF input parameters + + =========== ====== =================================================== + Parameter Type Description + =========== ====== =================================================== + latitude Long Input latitude, which is a long integer. + gridSize Int Grid size, required for calculating latitude. + oriLatitude Double Origin latitude, required for calculating latitude. + =========== ====== =================================================== + + UDF output parameter + + ========= ==== ================================= + Parameter Type Description + ========= ==== ================================= + Latitude Long Returns the upper-layer latitude. + ========= ==== ================================= + + Example: + + .. code-block:: + + select ToUpperLatitude (-23575161504L, 50, 39.832277) as upperLatitude from geoTable; + +- Converting longitude and latitude to GeoSOT + + **LatLngToGridCode(latitude, longitude, level)** + + UDF input parameters + + ========= ====== ================================== + Parameter Type Description + ========= ====== ================================== + latitude Double Latitude. + longitude Double Longitude. + level Int Level. The value range is [0, 32]. + ========= ====== ================================== + + UDF output parameter + + +-----------+------+---------------------------------------------------------------------------+ + | Parameter | Type | Description | + +===========+======+===========================================================================+ + | geoId | Long | A number that indicates the longitude and latitude after GeoSOT encoding. | + +-----------+------+---------------------------------------------------------------------------+ + + Example: + + .. code-block:: + + select LatLngToGridCode(39.930753, 116.302895, 21) as geoId; + +.. |image1| image:: /_static/images/en-us_image_0000001295739892.png diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_troubleshooting/filter_result_is_not_consistent_with_hive_when_a_big_double_type_value_is_used_in_filter.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_troubleshooting/filter_result_is_not_consistent_with_hive_when_a_big_double_type_value_is_used_in_filter.rst new file mode 100644 index 0000000..861ed33 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_troubleshooting/filter_result_is_not_consistent_with_hive_when_a_big_double_type_value_is_used_in_filter.rst @@ -0,0 +1,31 @@ +:original_name: mrs_01_1455.html + +.. _mrs_01_1455: + +Filter Result Is not Consistent with Hive when a Big Double Type Value Is Used in Filter +======================================================================================== + +Symptom +------- + +When double data type values with higher precision are used in filters, incorrect values are returned by filtering results. + +Possible Causes +--------------- + +When double data type values with higher precision are used in filters, values are rounded off before comparison. Therefore, values of double data type with different fraction part are considered same. + +Troubleshooting Method +---------------------- + +NA. + +Procedure +--------- + +To avoid this problem, use decimal data type when high precision data comparisons are required, such as financial applications, equality and inequality checks, and rounding operations. + +Reference Information +--------------------- + +NA. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_troubleshooting/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_troubleshooting/index.rst new file mode 100644 index 0000000..5499587 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_troubleshooting/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_1454.html + +.. _mrs_01_1454: + +CarbonData Troubleshooting +========================== + +- :ref:`Filter Result Is not Consistent with Hive when a Big Double Type Value Is Used in Filter ` +- :ref:`Query Performance Deterioration ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + filter_result_is_not_consistent_with_hive_when_a_big_double_type_value_is_used_in_filter + query_performance_deterioration diff --git a/doc/component-operation-guide-lts/source/using_carbondata/carbondata_troubleshooting/query_performance_deterioration.rst b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_troubleshooting/query_performance_deterioration.rst new file mode 100644 index 0000000..1c08721 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/carbondata_troubleshooting/query_performance_deterioration.rst @@ -0,0 +1,35 @@ +:original_name: mrs_01_1456.html + +.. _mrs_01_1456: + +Query Performance Deterioration +=============================== + +Symptom +------- + +The query performance fluctuates when the query is executed in different query periods. + +Possible Causes +--------------- + +During data loading, the memory configured for each executor program instance may be insufficient, resulting in more Java GCs. When GC occurs, the query performance deteriorates. + +Troubleshooting Method +---------------------- + +On the Spark UI, the GC time of some executors is obviously higher than that of other executors, or all executors have high GC time. + +Procedure +--------- + +Log in to Manager and choose **Cluster** > **Services** > **Spark2x**. On the displayed page. click the **Configurations** tab and then **All Configurations**, search for **spark.executor.memory** in the search box, and set its to a larger value. + +|image1| + +Reference +--------- + +None + +.. |image1| image:: /_static/images/en-us_image_0000001295900080.png diff --git a/doc/component-operation-guide-lts/source/using_carbondata/configuration_reference.rst b/doc/component-operation-guide-lts/source/using_carbondata/configuration_reference.rst new file mode 100644 index 0000000..bd1ad07 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/configuration_reference.rst @@ -0,0 +1,330 @@ +:original_name: mrs_01_1404.html + +.. _mrs_01_1404: + +Configuration Reference +======================= + +This section provides the details of all the configurations required for the CarbonData System. + +.. table:: **Table 1** System configurations in **carbon.properties** + + +----------------------------+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +============================+===========================+==================================================================================================================================================================================================================================================================================================+ + | carbon.ddl.base.hdfs.url | hdfs://hacluster/opt/data | HDFS relative path from the HDFS base path, which is configured in **fs.defaultFS**. The path configured in **carbon.ddl.base.hdfs.url** will be appended to the HDFS path configured in **fs.defaultFS**. If this path is configured, you do not need to pass the complete path while dataload. | + | | | | + | | | For example, if the absolute path of the CSV file is **hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv**, the path **hdfs://10.18.101.155:54310** will come from property **fs.defaultFS** and you can configure **/data/cnbc/** as **carbon.ddl.base.hdfs.url**. | + | | | | + | | | During data loading, you can specify the CSV path as **/2016/xyz.csv**. | + +----------------------------+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.badRecords.location | ``-`` | Storage path of bad records. This path is an HDFS path. The default value is **Null**. If bad records logging or bad records operation redirection is enabled, the path must be configured by the user. | + +----------------------------+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.bad.records.action | fail | The following are four types of actions for bad records: | + | | | | + | | | **FORCE**: Data is automatically corrected by storing the bad records as NULL. | + | | | | + | | | **REDIRECT**: Bad records are written to the raw CSV instead of being loaded. | + | | | | + | | | **IGNORE**: Bad records are neither loaded nor written to the raw CSV. | + | | | | + | | | **FAIL**: Data loading fails if any bad records are found. | + +----------------------------+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.update.sync.folder | /tmp/carbondata | Specifies the **modifiedTime.mdt** file path. You can set it to an existing path or a new path. | + | | | | + | | | .. note:: | + | | | | + | | | If you set this parameter to an existing path, ensure that all users can access the path and the path has the 777 permission. | + +----------------------------+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _mrs_01_1404__en-us_topic_0000001219350555_t197b7a04db3c4f919bd30707c2fdcd1f: + +.. table:: **Table 2** Performance configurations in **carbon.properties** + + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +==================================================+=======================+===================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | **Data Loading Configuration** | | | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.sort.file.write.buffer.size | 16384 | CarbonData sorts data and writes it to a temporary file to limit memory usage. This parameter controls the size of the buffer used for reading and writing temporary files. The unit is bytes. | + | | | | + | | | The value ranges from 10240 to 10485760. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.graph.rowset.size | 100000 | Rowset size exchanged in data loading graph steps. | + | | | | + | | | The value ranges from 500 to 1,000,000. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.number.of.cores.while.loading | 6 | Number of cores used during data loading. The greater the number of cores, the better the compaction performance. If the CPU resources are sufficient, you can increase the value of this parameter. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.sort.size | 500000 | Number of records to be sorted | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.enableXXHash | true | Hashmap algorithm used for hashkey calculation | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.number.of.cores.block.sort | 7 | Number of cores used for sorting blocks during data loading | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.max.driver.lru.cache.size | -1 | Maximum size of LRU caching for data loading at the driver side. The unit is MB. The default value is **-1**, indicating that there is no memory limit for the caching. Only integer values greater than 0 are accepted. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.max.executor.lru.cache.size | -1 | Maximum size of LRU caching for data loading at the executor side. The unit is MB. The default value is **-1**, indicating that there is no memory limit for the caching. Only integer values greater than 0 are accepted. If this parameter is not configured, the value of **carbon.max.driver.lru.cache.size** is used. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.merge.sort.prefetch | true | Whether to enable prefetch of data during merge sort while reading data from sorted temp files in the process of data loading | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.update.persist.enable | true | Configuration to enable the dataset of RDD/dataframe to persist data. Enabling this will reduce the execution time of UPDATE operation. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | enable.unsafe.sort | true | Whether to use unsafe sort during data loading. Unsafe sort reduces the garbage collection during data load operation, resulting in better performance. The default value is **true**, indicating that unsafe sort is enabled. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | enable.offheap.sort | true | Whether to use off-heap memory for sorting of data during data loading | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | offheap.sort.chunk.size.inmb | 64 | Size of data chunks to be sorted, in MB. The value ranges from 1 to 1024. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.unsafe.working.memory.in.mb | 512 | Size of the unsafe working memory. This will be used for sorting data and storing column pages. The unit is MB. | + | | | | + | | | Memory required for data loading: | + | | | | + | | | carbon.number.of.cores.while.loading [default value is 6] x Number of tables to load in parallel x offheap.sort.chunk.size.inmb [default value is 64 MB] + carbon.blockletgroup.size.in.mb [default value is 64 MB] + Current compaction ratio [64 MB/3.5]) | + | | | | + | | | = Around 900 MB per table | + | | | | + | | | Memory required for data query: | + | | | | + | | | (SPARK_EXECUTOR_INSTANCES. [default value is 2] x (carbon.blockletgroup.size.in.mb [default value: 64 MB] + carbon.blockletgroup.size.in.mb [default value = 64 MB x 3.5) x Number of cores per executor [default value: 1]) | + | | | | + | | | = ~ 600 MB | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.sort.inmemory.storage.size.in.mb | 512 | Size of the intermediate sort data to be kept in the memory. Once the specified value is reached, the system writes data to the disk. The unit is MB. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | sort.inmemory.size.inmb | 1024 | Size of the intermediate sort data to be kept in the memory. Once the specified value is reached, the system writes data to the disk. The unit is MB. | + | | | | + | | | If **carbon.unsafe.working.memory.in.mb** and **carbon.sort.inmemory.storage.size.in.mb** are configured, you do not need to set this parameter. If this parameter has been configured, 20% of the memory is used for working memory **carbon.unsafe.working.memory.in.mb**, and 80% is used for sort storage memory **carbon.sort.inmemory.storage.size.in.mb**. | + | | | | + | | | .. note:: | + | | | | + | | | The value of **spark.yarn.executor.memoryOverhead** configured for Spark must be greater than the value of **sort.inmemory.size.inmb** configured for CarbonData. Otherwise, Yarn might stop the executor if off-heap access exceeds the configured executor memory. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.blockletgroup.size.in.mb | 64 | The data is read as a group of blocklets which are called blocklet groups. This parameter specifies the size of each blocklet group. Higher value results in better sequential I/O access. | + | | | | + | | | The minimum value is 16 MB. Any value less than 16 MB will be reset to the default value (64 MB). | + | | | | + | | | The unit is MB. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | enable.inmemory.merge.sort | false | Whether to enable **inmemorymerge sort**. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | use.offheap.in.query.processing | true | Whether to enable **offheap** in query processing. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.load.sort.scope | local_sort | Sort scope for the load operation. There are two types of sort: **batch_sort** and **local_sort**. If **batch_sort** is selected, the loading performance is improved but the query performance is reduced. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.batch.sort.size.inmb | ``-`` | Size of data to be considered for batch sorting during data loading. The recommended value is less than 45% of the total sort data. The unit is MB. | + | | | | + | | | .. note:: | + | | | | + | | | If this parameter is not set, its value is about 45% of the value of **sort.inmemory.size.inmb** by default. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | enable.unsafe.columnpage | true | Whether to keep page data in heap memory during data loading or query to prevent garbage collection bottleneck. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.use.local.dir | false | Whether to use Yarn local directories for multi-disk data loading. If this parameter is set to **true**, Yarn local directories are used to load multi-disk data to improve data loading performance. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.use.multiple.temp.dir | false | Whether to use multiple temporary directories for storing temporary files to improve data loading performance. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.load.datamaps.parallel.db_name.table_name | NA | The value can be **true** or **false**. You can set the database name and table name to improve the first query performance of the table. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | **Compaction Configuration** | | | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.number.of.cores.while.compacting | 2 | Number of cores to be used while compacting data. The greater the number of cores, the better the compaction performance. If the CPU resources are sufficient, you can increase the value of this parameter. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.compaction.level.threshold | 4,3 | This configuration is for minor compaction which decides how many segments to be merged. | + | | | | + | | | For example, if this parameter is set to **2,3**, minor compaction is triggered every two segments. **3** is the number of level 1 compacted segments which is further compacted to new segment. | + | | | | + | | | The value ranges from 0 to 100. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.major.compaction.size | 1024 | Major compaction size. Sum of the segments which is below this threshold will be merged. | + | | | | + | | | The unit is MB. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.horizontal.compaction.enable | true | Whether to enable/disable horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the incremental (DELETE/ UPDATE) files becomes more than specified threshold. By default, this parameter is set to **true**. You can set this parameter to **false** to disable horizontal compaction. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.horizontal.update.compaction.threshold | 1 | Threshold limit on number of UPDATE delta files within a segment. In case the number of delta files goes beyond the threshold, the UPDATE delta files within the segment becomes eligible for horizontal compaction and are compacted into single UPDATE delta file. By default, this parameter is set to **1**. The value ranges from **1** to **10000**. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.horizontal.delete.compaction.threshold | 1 | Threshold limit on number of DELETE incremental files within a block of a segment. In case the number of incremental files goes beyond the threshold, the DELETE incremental files for the particular block of the segment becomes eligible for horizontal compaction and are compacted into single DELETE incremental file. By default, this parameter is set to **1**. The value ranges from **1** to **10000**. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | **Query Configuration** | | | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.number.of.cores | 4 | Number of cores to be used during query | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.limit.block.distribution.enable | false | Whether to enable the CarbonData distribution for limit query. The default value is **false**, indicating that block distribution is disabled for query statements that contain the keyword limit. For details about how to optimize this parameter, see :ref:`Configurations for Performance Tuning `. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.custom.block.distribution | false | Whether to enable Spark or CarbonData block distribution. By default, the value is **false**, indicating that Spark block distribution is enabled. To enable CarbonData block distribution, change the value to **true**. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.infilter.subquery.pushdown.enable | false | If this is set to **true** and a Select query is triggered in the filter with subquery, the subquery is executed and the output is broadcast as IN filter to the left table. Otherwise, SortMergeSemiJoin is executed. You are advised to set this to **true** when IN filter subquery does not return too many records. For example, when the IN sub-sentence query returns 10,000 or fewer records, enabling this parameter will give the query results faster. | + | | | | + | | | Example: **select \* from flow_carbon_256b where cus_no in (select cus_no from flow_carbon_256b where dt>='20260101' and dt<='20260701' and txn_bk='tk_1' and txn_br='tr_1') limit 1000;** | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.scheduler.minRegisteredResourcesRatio | 0.8 | Minimum resource (executor) ratio needed for starting the block distribution. The default value is **0.8**, indicating that 80% of the requested resources are allocated for starting block distribution. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.dynamicAllocation.schedulerTimeout | 5 | Maximum time that the scheduler waits for executors to be active. The default value is **5** seconds, and the maximum value is **15** seconds. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | enable.unsafe.in.query.processing | true | Whether to use unsafe sort during query. Unsafe sort reduces the garbage collection during query, resulting in better performance. The default value is **true**, indicating that unsafe sort is enabled. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.enable.vector.reader | true | Whether to enable vector processing for result collection to improve query performance | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.query.show.datamaps | true | **SHOW TABLES** lists all tables including the primary table and datamaps. To filter out the datamaps, set this parameter to **false**. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | **Secondary Index Configuration** | | | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.secondary.index.creation.threads | 1 | Number of threads to concurrently process segments during secondary index creation. This property helps fine-tuning the system when there are a lot of segments in a table. The value ranges from 1 to 50. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.si.lookup.partialstring | true | - When the parameter value is **true**, it includes indexes started with, ended with, and contained. | + | | | - When the parameter value is **false**, it includes only secondary indexes started with. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.si.segment.merge | true | Enabling this property merges **.carbondata** files inside the secondary index segment. The merging will happen after the load operation. That is, at the end of the secondary index table load, small files are checked and merged. | + | | | | + | | | .. note:: | + | | | | + | | | Table Block Size is used as the size threshold for merging small files. | + +--------------------------------------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. table:: **Table 3** Other configurations in **carbon.properties** + + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +==========================================+==========================+===========================================================================================================================================================================================================================================================================================+ + | **Data Loading Configuration** | | | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.lock.type | HDFSLOCK | Type of lock to be acquired during concurrent operations on a table. | + | | | | + | | | There are following types of lock implementation: | + | | | | + | | | - **LOCALLOCK**: Lock is created on local file system as a file. This lock is useful when only one Spark driver (or JDBCServer) runs on a machine. | + | | | - **HDFSLOCK**: Lock is created on HDFS file system as a file. This lock is useful when multiple Spark applications are running and no ZooKeeper is running on a cluster. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.sort.intermediate.files.limit | 20 | Minimum number of intermediate files. After intermediate files are generated, sort and merge the files. For details about how to optimize this parameter, see :ref:`Configurations for Performance Tuning `. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.csv.read.buffersize.byte | 1048576 | Size of CSV reading buffer | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.merge.sort.reader.thread | 3 | Maximum number of threads used for reading intermediate files for final merging. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.concurrent.lock.retries | 100 | Maximum number of retries used to obtain the concurrent operation lock. This parameter is used for concurrent loading. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.concurrent.lock.retry.timeout.sec | 1 | Interval between the retries to obtain the lock for concurrent operations. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.lock.retries | 3 | Maximum number of retries to obtain the lock for any operations other than import. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.lock.retry.timeout.sec | 5 | Interval between the retries to obtain the lock for any operation other than import. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.tempstore.location | /opt/Carbon/TempStoreLoc | Temporary storage location. By default, the **System.getProperty("java.io.tmpdir")** method is used to obtain the value. For details about how to optimize this parameter, see the description of **carbon.use.local.dir** in :ref:`Configurations for Performance Tuning `. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.load.log.counter | 500000 | Data loading records count in logs | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | SERIALIZATION_NULL_FORMAT | \\N | Value to be replaced with NULL | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.skip.empty.line | false | Setting this property will ignore the empty lines in the CSV file during data loading. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.load.datamaps.parallel | false | Whether to enable parallel datamap loading for all tables in all sessions. This property will improve the time to load datamaps into memory by distributing the job among executors, thus improving query performance. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | **Merging Configuration** | | | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.numberof.preserve.segments | 0 | If you want to preserve some number of segments from being compacted, then you can set this configuration. | + | | | | + | | | For example, if **carbon.numberof.preserve.segments** is set to **2**, the latest two segments will always be excluded from the compaction. | + | | | | + | | | No segments will be preserved by default. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.allowed.compaction.days | 0 | This configuration is used to control on the number of recent segments that needs to be merged. | + | | | | + | | | For example, if this parameter is set to **2**, the segments which are loaded in the time frame of past 2 days only will get merged. Segments which are loaded earlier than 2 days will not be merged. | + | | | | + | | | This configuration is disabled by default. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.enable.auto.load.merge | false | Whether to enable compaction along with data loading. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.merge.index.in.segment | true | This configuration enables to merge all the CarbonIndex files (**.carbonindex**) into a single MergeIndex file (**.carbonindexmerge**) upon data loading completion. This significantly reduces the delay in serving the first query. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | **Query Configuration** | | | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | max.query.execution.time | 60 | Maximum time allowed for one query to be executed. | + | | | | + | | | The unit is minute. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.enableMinMax | true | MinMax is used to improve query performance. You can set this to **false** to disable this function. | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.lease.recovery.retry.count | 5 | Maximum number of attempts that need to be made for recovering a lease on a file. | + | | | | + | | | Minimum value: **1** | + | | | | + | | | Maximum value: **50** | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | carbon.lease.recovery.retry.interval | 1000 (ms) | Interval or pause time after a lease recovery attempt is made on a file. | + | | | | + | | | Minimum value: **1000** (ms) | + | | | | + | | | Maximum value: **10000** (ms) | + +------------------------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. table:: **Table 4** Spark configuration reference in **spark-defaults.conf** + + +-----------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=============================+=======================+=====================================================================================================================================================================================================================================================+ + | spark.driver.memory | 4G | Memory to be used for the driver process. SparkContext has been initialized. | + | | | | + | | | .. note:: | + | | | | + | | | In client mode, do not use SparkConf to set this parameter in the application because the driver JVM has been started. To configure this parameter, configure it in the **--driver-memory** command-line option or in the default property file. | + +-----------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.executor.memory | 4GB | Memory to be used for each executor process. | + +-----------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.sql.crossJoin.enabled | true | If the query contains a cross join, enable this property so that no error is thrown. In this case, you can use a cross join instead of a join for better performance. | + +-----------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Configure the following parameters in the **spark-defaults.conf** file on the Spark driver. + +- In spark-sql mode: + + .. _mrs_01_1404__en-us_topic_0000001219350555_ta902cd071dfb426097416a5c7034ee6c: + + .. table:: **Table 5** Parameter description + + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Value | Description | + +========================================+==============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+=================================================================================================================================================================================================================================================================+ + | spark.driver.extraJavaOptions | -Dlog4j.configuration=file:/opt/client/Spark2x/spark/conf/log4j.properties -Djetty.version=x.y.z -Dzookeeper.server.principal=zookeeper/hadoop.\ ** -Djava.security.krb5.conf=/opt/client/KrbClient/kerberos/var/krb5kdc/krb5.conf -Djava.security.auth.login.config=/opt/client/Spark2x/spark/conf/jaas.conf -Dorg.xerial.snappy.tempdir=/opt/client/Spark2x/tmp -Dcarbon.properties.filepath=/opt/client/Spark2x/spark/conf/carbon.properties -Djava.io.tmpdir=/opt/client/Spark2x/tmp | The default value\ **/opt/client/Spark2x/spark**\ indicates **CLIENT_HOME** of the client and is added to the end of the value of\ **spark.driver.extraJavaOptions**. This parameter is used to specify the path of the\ **carbon.properties**\ file in Driver. | + | | | | + | | | .. note:: | + | | | | + | | | Spaces next to equal marks (=) are not allowed. | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.sql.session.state.builder | org.apache.spark.sql.hive.FIHiveACLSessionStateBuilder | Session state constructor. | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.carbon.sqlastbuilder.classname | org.apache.spark.sql.hive.CarbonInternalSqlAstBuilder | AST constructor. | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.sql.catalog.class | org.apache.spark.sql.hive.HiveACLExternalCatalog | Hive External catalog to be used. This parameter is mandatory if Spark ACL is enabled. | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.sql.hive.implementation | org.apache.spark.sql.hive.HiveACLClientImpl | How to call the Hive client. This parameter is mandatory if Spark ACL is enabled. | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.sql.hiveClient.isolation.enabled | false | This parameter is mandatory if Spark ACL is enabled. | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- In JDBCServer mode: + + .. _mrs_01_1404__en-us_topic_0000001219350555_t3897ae14f205433fb0f98b79411cfa0c: + + .. table:: **Table 6** Parameter description + + +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Value | Description | + +========================================+===========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+========================================================================================================================================================================================================================================================+ + | spark.driver.extraJavaOptions | -Xloggc:${SPARK_LOG_DIR}/indexserver-omm-%p-gc.log -XX:+PrintGCDetails -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:MaxDirectMemorySize=512M -XX:MaxMetaspaceSize=512M -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=10M -XX:OnOutOfMemoryError='kill -9 %p' -Djetty.version=x.y.z -Dorg.xerial.snappy.tempdir=${BIGDATA_HOME}/tmp/spark2x/JDBCServer/snappy_tmp -Djava.io.tmpdir=${BIGDATA_HOME}/tmp/spark2x/JDBCServer/io_tmp -Dcarbon.properties.filepath=${SPARK_CONF_DIR}/carbon.properties -Djdk.tls.ephemeralDHKeySize=2048 -Dspark.ssl.keyStore=${SPARK_CONF_DIR}/child.keystore #{java_stack_prefer} | The default value **${SPARK_CONF_DIR}** depends on a specific cluster and is added to the end of the value of the **spark.driver.extraJavaOptions** parameter. This parameter is used to specify the path of the **carbon.properties** file in Driver. | + | | | | + | | | .. note:: | + | | | | + | | | Spaces next to equal marks (=) are not allowed. | + +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.sql.session.state.builder | org.apache.spark.sql.hive.FIHiveACLSessionStateBuilder | Session state constructor. | + +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.carbon.sqlastbuilder.classname | org.apache.spark.sql.hive.CarbonInternalSqlAstBuilder | AST constructor. | + +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.sql.catalog.class | org.apache.spark.sql.hive.HiveACLExternalCatalog | Hive External catalog to be used. This parameter is mandatory if Spark ACL is enabled. | + +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.sql.hive.implementation | org.apache.spark.sql.hive.HiveACLClientImpl | How to call the Hive client. This parameter is mandatory if Spark ACL is enabled. | + +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spark.sql.hiveClient.isolation.enabled | false | This parameter is mandatory if Spark ACL is enabled. | + +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_carbondata/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/index.rst new file mode 100644 index 0000000..0950137 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/index.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_1400.html + +.. _mrs_01_1400: + +Using CarbonData +================ + +- :ref:`Overview ` +- :ref:`Configuration Reference ` +- :ref:`CarbonData Operation Guide ` +- :ref:`CarbonData Performance Tuning ` +- :ref:`CarbonData Access Control ` +- :ref:`CarbonData Syntax Reference ` +- :ref:`CarbonData Troubleshooting ` +- :ref:`CarbonData FAQ ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + overview/index + configuration_reference + carbondata_operation_guide/index + carbondata_performance_tuning/index + carbondata_access_control + carbondata_syntax_reference/index + carbondata_troubleshooting/index + carbondata_faq/index diff --git a/doc/component-operation-guide-lts/source/using_carbondata/overview/carbondata_overview.rst b/doc/component-operation-guide-lts/source/using_carbondata/overview/carbondata_overview.rst new file mode 100644 index 0000000..5d0db8b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/overview/carbondata_overview.rst @@ -0,0 +1,51 @@ +:original_name: mrs_01_1402.html + +.. _mrs_01_1402: + +CarbonData Overview +=================== + +CarbonData is a new Apache Hadoop native data-store format. CarbonData allows faster interactive queries over PetaBytes of data using advanced columnar storage, index, compression, and encoding techniques to improve computing efficiency. In addition, CarbonData is also a high-performance analysis engine that integrates data sources with Spark. + + +.. figure:: /_static/images/en-us_image_0000001348739953.png + :alt: **Figure 1** Basic architecture of CarbonData + + **Figure 1** Basic architecture of CarbonData + +The purpose of using CarbonData is to provide quick response to ad hoc queries of big data. Essentially, CarbonData is an Online Analytical Processing (OLAP) engine, which stores data by using tables similar to those in Relational Database Management System (RDBMS). You can import more than 10 TB data to tables created in CarbonData format, and CarbonData automatically organizes and stores data using the compressed multi-dimensional indexes. After data is loaded to CarbonData, CarbonData responds to ad hoc queries in seconds. + +CarbonData integrates data sources into the Spark ecosystem and you can query and analyze the data using Spark SQL. You can also use the third-party tool JDBCServer provided by Spark to connect to SparkSQL. + +Topology of CarbonData +---------------------- + +CarbonData runs as a data source inside Spark. Therefore, CarbonData does not start any additional processes on nodes in clusters. CarbonData engine runs inside the Spark executor. + + +.. figure:: /_static/images/en-us_image_0000001349259233.png + :alt: **Figure 2** Topology of CarbonData + + **Figure 2** Topology of CarbonData + +Data stored in CarbonData Table is divided into several CarbonData data files. Each time when data is queried, CarbonData Engine reads and filters data sets. CarbonData Engine runs as a part of the Spark Executor process and is responsible for handling a subset of data file blocks. + +Table data is stored in HDFS. Nodes in the same Spark cluster can be used as HDFS data nodes. + +CarbonData Features +------------------- + +- SQL: CarbonData is compatible with Spark SQL and supports SQL query operations performed on Spark SQL. +- Simple Table dataset definition: CarbonData allows you to define and create datasets by using user-friendly Data Definition Language (DDL) statements. CarbonData DDL is flexible and easy to use, and can define complex tables. +- Easy data management: CarbonData provides various data management functions for data loading and maintenance. CarbonData supports bulk loading of historical data and incremental loading of new data. Loaded data can be deleted based on load time and a specific loading operation can be undone. +- CarbonData file format is a columnar store in HDFS. This format has many new column-based file storage features, such as table splitting and data compression. CarbonData has the following characteristics: + + - Stores data along with index: Significantly accelerates query performance and reduces the I/O scans and CPU resources, when there are filters in the query. CarbonData index consists of multiple levels of indices. A processing framework can leverage this index to reduce the task that needs to be schedules and processed, and it can also perform skip scan in more finer grain unit (called blocklet) in task side scanning instead of scanning the whole file. + - Operable encoded data: Through supporting efficient compression, CarbonData can query on compressed/encoded data. The data can be converted just before returning the results to the users, which is called late materialized. + - Support for various use cases with one single data format: like interactive OLAP-style query, sequential access (big scan), and random access (narrow scan). + +Key Technologies and Advantages of CarbonData +--------------------------------------------- + +- Quick query response: CarbonData features high-performance query. The query speed of CarbonData is 10 times of that of Spark SQL. It uses dedicated data formats and applies multiple index technologies and multiple push-down optimizations, providing quick response to TB-level data queries. +- Efficient data compression: CarbonData compresses data by combining the lightweight and heavyweight compression algorithms. This significantly saves 60% to 80% data storage space and the hardware storage cost. diff --git a/doc/component-operation-guide-lts/source/using_carbondata/overview/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/overview/index.rst new file mode 100644 index 0000000..e0cc8c5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/overview/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_1401.html + +.. _mrs_01_1401: + +Overview +======== + +- :ref:`CarbonData Overview ` +- :ref:`Main Specifications of CarbonData ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + carbondata_overview + main_specifications_of_carbondata diff --git a/doc/component-operation-guide-lts/source/using_carbondata/overview/main_specifications_of_carbondata.rst b/doc/component-operation-guide-lts/source/using_carbondata/overview/main_specifications_of_carbondata.rst new file mode 100644 index 0000000..4b96f74 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_carbondata/overview/main_specifications_of_carbondata.rst @@ -0,0 +1,72 @@ +:original_name: mrs_01_1403.html + +.. _mrs_01_1403: + +Main Specifications of CarbonData +================================= + + +Main Specifications of CarbonData +--------------------------------- + +.. table:: **Table 1** Main Specifications of CarbonData + + +------------------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+ + | Entity | Tested Value | Test Environment | + +====================================+========================================================================+=====================================================================================================+ + | Number of tables | 10000 | 3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors. | + | | | | + | | | Total columns: 107 | + | | | | + | | | String: 75 | + | | | | + | | | Int: 13 | + | | | | + | | | BigInt: 7 | + | | | | + | | | Timestamp: 6 | + | | | | + | | | Double: 6 | + +------------------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+ + | Number of table columns | 2000 | 3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors. | + +------------------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+ + | Maximum size of a raw CSV file | 200GB | 17 cluster nodes. 150 GB memory and 25 vCPUs for each executor. Driver memory: 10 GB, 17 executors. | + +------------------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+ + | Number of CSV files in each folder | 100 folders. Each folder has 10 files. The size of each file is 50 MB. | 3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors. | + +------------------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+ + | Number of load folders | 10000 | 3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors. | + +------------------------------------+------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+ + +The memory required for data loading depends on the following factors: + +- Number of columns +- Column values +- Concurrency (configured using **carbon.number.of.cores.while.loading**) +- Sort size in memory (configured using **carbon.sort.size**) +- Intermediate cache (configured using **carbon.graph.rowset.size**) + +Data loading of an 8 GB CSV file that contains 10 million records and 300 columns with each row size being about 0.8 KB requires about 10 GB executor memory. That is, set **carbon.sort.size** to **100000** and retain the default values for other parameters. + +Table Specifications +-------------------- + +.. table:: **Table 2** Table specifications + + +-----------------------------------------------------------------------------------------------------------+--------------+ + | Entity | Tested Value | + +===========================================================================================================+==============+ + | Number of secondary index tables | 10 | + +-----------------------------------------------------------------------------------------------------------+--------------+ + | Number of composite columns in a secondary index table | 5 | + +-----------------------------------------------------------------------------------------------------------+--------------+ + | Length of column name in a secondary index table (unit: character) | 120 | + +-----------------------------------------------------------------------------------------------------------+--------------+ + | Length of a secondary index table name (unit: character) | 120 | + +-----------------------------------------------------------------------------------------------------------+--------------+ + | Cumulative length of all secondary index table names + column names in an index table\* (unit: character) | 3800*\* | + +-----------------------------------------------------------------------------------------------------------+--------------+ + +.. note:: + + - \* Characters of column names in an index table refers to the upper limit allowed by Hive or the upper limit of available resources. + - \*\* Secondary index tables are registered using Hive and stored in HiveSERDEPROPERTIES in JSON format. The value of **SERDEPROPERTIES** supported by Hive can contain a maximum of 4,000 characters and cannot be changed. diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/adaptive_mv_usage_in_clickhouse.rst b/doc/component-operation-guide-lts/source/using_clickhouse/adaptive_mv_usage_in_clickhouse.rst new file mode 100644 index 0000000..0a510b7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/adaptive_mv_usage_in_clickhouse.rst @@ -0,0 +1,286 @@ +:original_name: mrs_01_24287.html + +.. _mrs_01_24287: + +Adaptive MV Usage in ClickHouse +=============================== + +Scenario +-------- + +Materialized views (MVs) are used in ClickHouse to save the precomputed result of time-consuming operations. When querying data, you can query the materialized views rather than the original tables, thereby quickly obtaining the query result. + +Currently, MVs are not easy to use in ClickHouse. Users can create one or more MVs based on the original table data as required. Once multiple MVs are created, you need to identify which MV is used and convert the query statement of an original table to that of an MV. In this way, the querying process is inefficient and prone to errors. + +The problem mentioned above is readily solved since the adoption of adaptive MVs. When querying an original table, the corresponding MV of this table will be queried, which greatly improves the usability and efficiency of ClickHouse. + +Matching Rules of Adaptive MVs +------------------------------ + +To ensure that the SQL statement for querying an original table can be automatically converted to that for querying the corresponding MV, the following matching rules must be met: + +- The table to be queried using an SQL statement must be associated with an MV. +- The AggregatingMergeTree engine must be used with MVs. +- Both the SELECT clause of SQL and MVs must contain aggregate functions. +- If the SQL query contains a GROUP BY clause, MVs must also contain this clause. +- If an MV contains a WHERE clause of SQL, the WHERE clause must be the same as that of the MV. This also applies to the PREWHERE and HAVING clauses. +- Fields to be queried using the SQL statements must exist in the MVs. +- If multiple MVs meet the preceding requirements, the SQL statement for querying the original table will be used. + +For details about common matching failures of adaptive MVs, see :ref:`Common Matching Failures of MVs `. + +Using Adaptive MVs +------------------ + +In the following operations, **local_table** is the original table and **view_table** is the MV created based on **local_table**. Change the table creation and query statements based on the site requirements. + +#. Use the ClickHouse client to connect to the default database. For details, see :ref:`Using ClickHouse from Scratch `. + +#. Run the following table creation statements to create the original table **local_table**. + + .. code-block:: + + CREATE TABLE local_table + ( + id String, + city String, + code String, + value UInt32, + create_time DateTime, + age UInt32 + ) + ENGINE = MergeTree + PARTITION BY toDate(create_time) + ORDER BY (id, city, create_time); + +#. Create the MV **view_table** based on **local_table**. + + .. code-block:: + + CREATE MATERIALIZED VIEW view_table + ENGINE = AggregatingMergeTree + PARTITION BY toDate(create_time) + ORDER BY (id, city, create_time) + AS SELECT + create_time, + id, + city, + uniqState(code), + sumState(value) AS value_new, + minState(create_time) AS first_time, + maxState(create_time) AS last_time + FROM local_table + WHERE create_time >= toDateTime('2021-01-01 00:00:00') + GROUP BY id, city, create_time; + +#. Insert data to the **local_table** table. + + .. code-block:: + + INSERT INTO local_table values('1','zzz','code1',1,toDateTime('2021-01-02 00:00:00'), 10); + INSERT INTO local_table values('2','kkk','code2',2,toDateTime('2020-01-01 00:00:00'), 20); + INSERT INTO local_table values('3','ccc','code3',3,toDateTime('2022-01-01 00:00:00'), 30); + +#. Run the following command to enable the adaptive MVs. + + .. code-block:: + + set adaptive_materilized_view = 1; + + .. note:: + + If the **adaptive_materilized_view** parameter is set to **1**, the adaptive MVs are enabled. If it is set to **0**, the adaptive MVs are disabled. The default value is **0**. **set adaptive_materilized_view = 1;** is a session-level command and needs to be reset each time the client connects to the server. + +#. .. _mrs_01_24287__en-us_topic_0000001219029825_li1961113141710: + + Query data in the **local_table** table. + + .. code-block:: + + SELECT sum(value) + FROM local_table + WHERE create_time >= toDateTime('2021-01-01 00:00:00') + ┌─sumMerge(value_new)─┐ + │ 4 │ + └─────────────────────┘ + +#. Run the **explain syntax** command to view the execution plan of the SQL statement in step :ref:`6 `. According to the query result, **view_table** is queried. + + .. code-block:: + + EXPLAIN SYNTAX + SELECT sum(value) + FROM local_table + WHERE create_time >= toDateTime('2021-01-01 00:00:00') + ┌─explain────────────────────┐ + │ SELECT sumMerge(value_new) │ + │ FROM default.view_table │ + └────────────────────────────┘ + +.. _mrs_01_24287__en-us_topic_0000001219029825_section332911341804: + +Common Matching Failures of MVs +------------------------------- + +- When creating an MV, the aggregate functions must contain the State suffix. Otherwise, the corresponding MV cannot be matched. Example: + + .. code-block:: + + # # The MV agg_view is created based on the original table test_table. However, the count aggregate function does not contain the State suffix. + CREATE MATERIALIZED VIEW agg_view + ENGINE = AggregatingMergeTree + PARTITION BY toDate(create_time) + ORDER BY (id) + AS SELECT + create_time, + id, + count(id) + FROM test_table + GROUP BY id,create_time; + + # To ensure that the MV can be matched, the count aggregate function for creating the MV must contain the State suffix. The correct example is as follows: + CREATE MATERIALIZED VIEW agg_view + ENGINE = AggregatingMergeTree + PARTITION BY toDate(create_time) + ORDER BY (id) + AS SELECT + create_time, + id, + countState(id) + FROM test_table + GROUP BY id,create_time; + +- Only if the WHERE clause of the statement for querying an original table is completely the same as that in an MV can the MV be matched. + + For example, if the WHERE clause of the original table statement is **where a=b** while the WHERE clause of the MV is **where b=a**, the corresponding MV cannot be matched. + + However, if the statement for querying the original table does not contain the database name, the corresponding MV can be matched. Example: + + .. code-block:: + + # The MV view_test is created based on db_test.table_test. The WHERE clause for querying the original table contains the database name db_test. + CREATE MATERIALIZED VIEW db_test.view_test ENGINE = AggregatingMergeTree ORDER BY phone AS + SELECT + name, + phone, + uniqExactState(class) as uniq_class, + sumState(CRC32(phone)) + FROM db_test.table_test + WHERE (class, name) GLOBAL IN + ( + SELECT class, name FROM db_test.table_test + WHERE + name = 'zzzz' + AND class = 'calss one' + ) + GROUP BY + name, phone; + # If the WHERE clause does not contain the database name db_test, the corresponding MV will be matched. + USE db_test; + EXPLAIN SYNTAX + SELECT + name, + phone, + uniqExact(class) as uniq_class, + sum(CRC32(phone)) + FROM table_test + WHERE (class, name) GLOBAL IN + ( + SELECT class, name FROM table_test + WHERE + name = 'zzzz' + AND class = 'calss one' + ) + GROUP BY + name, phone; + +- If the GROUP BY clause contains functions, the corresponding MV can be matched only when the column field names in the functions are the same as those in an original table. Example: + + .. code-block:: + + # Create the MV agg_view based on test_table. + CREATE MATERIALIZED VIEW agg_view + ENGINE = AggregatingMergeTree + PARTITION BY toDate(create_time) + ORDER BY (id, city, create_time) + AS SELECT + create_time, + id, + city, + value as value1, + uniqState(code), + sumState(value) AS value_new, + minState(create_time) AS first_time, + maxState(create_time) AS last_time + FROM test_table + GROUP BY id, city, create_time, value1 % 2, value1; + # The corresponding MV can be matched if the statement is as follows: + SELECT uniq(code) FROM test_table GROUP BY id, city, value1 % 2; + # The corresponding MV cannot be matched if the statement is as follows: + SELECT uniq(code) FROM test_table GROUP BY id, city, value % 2; + +- In a created MV, the FROM clause cannot be a SELECT statement. Otherwise, the corresponding MV will fail to be matched. In the following example, the FROM clause is a SELECT statement. In this case, the corresponding MV cannot be matched. + + .. code-block:: + + CREATE MATERIALIZED VIEW agg_view + ENGINE = AggregatingMergeTree + PARTITION BY toDate(create_time) + ORDER BY (id) + AS SELECT + create_time, + id, + countState(id) + FROM + (SELECT id, create_time FROM test_table) + GROUP BY id,create_time; + +- When querying original tables or creating MVs, an aggregate function cannot be used together with another aggregate function or a common function. Example: + + .. code-block:: + + # Case 1: Multiple aggregate functions are used when querying an original table. + # Create an MV. + CREATE MATERIALIZED VIEW agg_view + ENGINE = AggregatingMergeTree + PARTITION BY toDate(create_time) + ORDER BY (id) + AS SELECT + create_time, + id, + countState(id) + FROM test_table + GROUP BY id,create_time; + # Two aggregate functions are used when querying the original table, leading to the MV matching failure. + SELECT count(id) + count(id) FROM test_table; + # Case 2: Multiple aggregate functions are used when creating an MV. + # Two countState(id) functions are used when creating the MV, leading to the MV matching failure. + CREATE MATERIALIZED VIEW agg_view + ENGINE = AggregatingMergeTree + PARTITION BY toDate(create_time) + ORDER BY (id) + AS SELECT + create_time, + id, + (countState(id) + countState(id)) AS new_count + FROM test_table + GROUP BY id,create_time; + # The corresponding MV cannot be matched when querying the original table. + SELECT new_count FROM test_table; + + However, if the parameter of an aggregate function is the combination operation of fields, the corresponding MV can be matched. + + .. code-block:: + + CREATE MATERIALIZED VIEW agg_view + ENGINE = AggregatingMergeTree + PARTITION BY toDate(create_time) + ORDER BY (id) + AS SELECT + create_time, + id, + countState(id + id) + FROM test_table + GROUP BY id,create_time; + # The corresponding MV can be matched when querying the original table. + SELECT count(id + id) FROM test_table; diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_log_overview.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_log_overview.rst new file mode 100644 index 0000000..beea3d1 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_log_overview.rst @@ -0,0 +1,94 @@ +:original_name: mrs_01_2399.html + +.. _mrs_01_2399: + +ClickHouse Log Overview +======================= + +Log Description +--------------- + +**Log path**: The default storage path of ClickHouse log files is as follows: **${BIGDATA_LOG_HOME}/clickhouse** + +**Log archive rule**: The automatic ClickHouse log compression function is enabled. By default, when the size of logs exceeds 100 MB, logs are automatically compressed into a log file named in the following format: **\ **.**\ *[ID]*\ **.gz**. A maximum of 10 latest compressed files are reserved by default. The number of compressed files can be configured on Manager. + +.. table:: **Table 1** ClickHouse log list + + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Log Type | Log File Name | Description | + +==========================+===========================================================================================================================+======================================================================================================================================+ + | Run logs | /var/log/Bigdata/clickhouse/clickhouseServer/clickhouse-server.err.log | Path of ClickHouseServer error log files. | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/clickhouse/clickhouseServer/checkService.log | Path of key ClickHouseServer run log files. | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/clickhouse/clickhouseServer/clickhouse-server.log | | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/clickhouse/balance/start.log | Path of ClickHouseBalancer startup log files. | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/clickhouse/balance/error.log | Path of ClickHouseBalancer error log files. | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/clickhouse/balance/access_http.log | Path of ClickHouseBalancer run log files. | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Data migration logs | /var/log/Bigdata/clickhouse/migration/*Data migration task name*/clickhouse-copier_{timestamp}_{processId}/copier.log | Run logs generated when you use the migration tool by referring to :ref:`Using the ClickHouse Data Migration Tool `. | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/clickhouse/migration/*Data migration task name*/clickhouse-copier_{timestamp}_{processId}/copier.err.log | Error logs generated when you use the migration tool by referring to :ref:`Using the ClickHouse Data Migration Tool `. | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Audit log | /var/log/Bigdata/audit/clickhouse/clickhouse-server.audit.log | Path of ClickHouse audit log files. | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Backup and recovery logs | /var/log/Bigdata/clickhouse/clickhouseServer/backup.log | Path of log files generated when ClickHouse performs the backup and restoration operations on Manager. | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + +Log Level +--------- + +:ref:`Table 2 ` describes the log levels supported by ClickHouse. + +Levels of run logs are error, warning, trace, information, and debug from the highest to the lowest priority. Run logs of equal or higher levels are recorded. The higher the specified log level, the fewer the logs recorded. + +.. _mrs_01_2399__en-us_topic_0000001219029595_tc09b739e3eb34797a6da936a37654e97: + +.. table:: **Table 2** Log levels + + +----------+-------------+------------------------------------------------------------------------------------------+ + | Log Type | Level | Description | + +==========+=============+==========================================================================================+ + | Run log | error | Logs of this level record error information about system running. | + +----------+-------------+------------------------------------------------------------------------------------------+ + | | warning | Logs of this level record exception information about the current event processing. | + +----------+-------------+------------------------------------------------------------------------------------------+ + | | trace | Logs of this level record trace information about the current event processing. | + +----------+-------------+------------------------------------------------------------------------------------------+ + | | information | Logs of this level record normal running status information about the system and events. | + +----------+-------------+------------------------------------------------------------------------------------------+ + | | debug | Logs of this level record system running and debugging information. | + +----------+-------------+------------------------------------------------------------------------------------------+ + +To modify log levels, perform the following operations: + +#. Log in to FusionInsight Manager. +#. Choose **Cluster** > **Services** > **ClickHouse** > **Configurations**. +#. Select **All Configurations**. +#. On the menu bar on the left, select the log menu of the target role. +#. Select a desired log level. +#. Click **Save**. Then, click **OK**. + +.. note:: + + The configurations take effect immediately without the need to restart the service. + +Log Format +---------- + +The following table lists the ClickHouse log format: + +.. table:: **Table 3** Log formats + + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Log Type | Format | Example | + +=======================+========================================================================================================================================================+==========================================================================================================================================================================================================================================+ + | Run log | <*yyyy-MM-dd HH:mm:ss,SSS*>|<*Log level*>|<*Name of the thread that generates the log*>|<*Message in the log*>|<*Location where the log event occurs*> | 2021.02.23 15:26:30.691301 [ 6085 ] {} DynamicQueryHandler: Code: 516, e.displayText() = DB::Exception: default: Authentication failed: password is incorrect or there is no user with such name, Stack trace (when copying this | + | | | | + | | | message, always include the lines below): | + | | | | + | | | 0. Poco::Exception::Exception(std::__1::basic_string, std::__1::allocator > const&, int) @ 0x1250e59c | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_table_engine_overview.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_table_engine_overview.rst new file mode 100644 index 0000000..454baed --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_table_engine_overview.rst @@ -0,0 +1,389 @@ +:original_name: mrs_01_24105.html + +.. _mrs_01_24105: + +ClickHouse Table Engine Overview +================================ + +Background +---------- + +Table engines play a key role in ClickHouse to determine: + +- Where to write and read data +- Supported query modes +- Whether concurrent data access is supported +- Whether indexes can be used +- Whether multi-thread requests can be executed +- Parameters used for data replication + +This section describes MergeTree and Distributed engines, which are the most important and frequently used ClickHouse table engines. + +MergeTree Family +---------------- + +Engines of the MergeTree family are the most universal and functional table engines for high-load tasks. They have the following key features: + +- Data is stored by partition and block based on partitioning keys. +- Data index is sorted based on primary keys and the **ORDER BY** sorting keys. +- Data replication is supported by table engines prefixed with Replicated. +- Data sampling is supported. + +When data is written, a table with this type of engine divides data into different folders based on the partitioning key. Each column of data in the folder is an independent file. A file that records serialized index sorting is created. This structure reduces the volume of data to be retrieved during data reading, greatly improving query efficiency. + +- MergeTree + + **Syntax for creating a table**: + + .. code-block:: + + CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] + ( + name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1], + name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2], + ... + INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1, + INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2 + ) ENGINE = MergeTree() + ORDER BY expr + [PARTITION BY expr] + [PRIMARY KEY expr] + [SAMPLE BY expr] + [TTL expr [DELETE|TO DISK 'xxx'|TO VOLUME 'xxx'], ...] + [SETTINGS name=value, ...] + + **Example**: + + .. code-block:: + + CREATE TABLE default.test ( + name1 DateTime, + name2 String, + name3 String, + name4 String, + name5 Date, + ... + ) ENGINE = MergeTree() + PARTITION BY toYYYYMM(name5) + ORDER BY (name1, name2) + SETTINGS index_granularity = 8192 + + Parameters in the example are described as follows: + + - **ENGINE = MergeTree()**: specifies the MergeTree engine. + - **PARTITION BY** **toYYYYMM(name4)**: specifies the partition. The sample data is partitioned by month, and a folder is created for each month. + - **ORDER BY**: specifies the sorting field. A multi-field index can be sorted. If the first field is the same, the second field is used for sorting, and so on. + - **index_granularity = 8192**: specifies the index granularity. One index value is recorded for every 8,192 data records. + + If the data to be queried exists in a partition or sorting field, the data query time can be greatly reduced. + +- ReplacingMergeTree + + Different from MergeTree, ReplacingMergeTree deletes duplicate entries with the same sorting key. ReplacingMergeTree is suitable for clearing duplicate data to save space, but it does not guarantee the absence of duplicate data. Generally, it is not recommended. + + **Syntax for creating a table**: + + .. code-block:: + + CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] + ( + name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1], + name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2], + ... + ) ENGINE = ReplacingMergeTree([ver]) + [PARTITION BY expr] + [ORDER BY expr] + [SAMPLE BY expr] + [SETTINGS name=value, ...] + +- SummingMergeTree + + When merging data parts in SummingMergeTree tables, ClickHouse merges all rows with the same primary key into one row that contains summed values for the columns with the numeric data type. If the primary key is composed in a way that a single key value corresponds to large number of rows, storage volume can be significantly reduced and the data query speed can be accelerated. + + **Syntax for creating a table**: + + .. code-block:: + + CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] + ( + name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1], + name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2], + ... + ) ENGINE = SummingMergeTree([columns]) + [PARTITION BY expr] + [ORDER BY expr] + [SAMPLE BY expr] + [SETTINGS name=value, ...] + + **Example**: + + Create a SummingMergeTree table named **testTable**. + + .. code-block:: + + CREATE TABLE testTable + ( + id UInt32, + value UInt32 + ) + ENGINE = SummingMergeTree() + ORDER BY id + + Insert data into the table. + + .. code-block:: + + INSERT INTO testTable Values(5,9),(5,3),(4,6),(1,2),(2,5),(1,4),(3,8); + INSERT INTO testTable Values(88,5),(5,5),(3,7),(3,5),(1,6),(2,6),(4,7),(4,6),(43,5),(5,9),(3,6); + + Query all data in unmerged parts. + + .. code-block:: + + SELECT * FROM testTable + ┌─id─┬─value─┐ + │ 1 │ 6 │ + │ 2 │ 5 │ + │ 3 │ 8 │ + │ 4 │ 6 │ + │ 5 │ 12 │ + └───┴──── ┘ + ┌─id─┬─value─┐ + │ 1 │ 6 │ + │ 2 │ 6 │ + │ 3 │ 18 │ + │ 4 │ 13 │ + │ 5 │ 14 │ + │ 43 │ 5 │ + │ 88 │ 5 │ + └───┴──── ┘ + + If ClickHouse has not summed up all rows and you need to aggregate data by ID, use the **sum** function and **GROUP BY** statement. + + .. code-block:: + + SELECT id, sum(value) FROM testTable GROUP BY id + ┌─id─┬─sum(value)─┐ + │ 4 │ 19 │ + │ 3 │ 26 │ + │ 88 │ 5 │ + │ 2 │ 11 │ + │ 5 │ 26 │ + │ 1 │ 12 │ + │ 43 │ 5 │ + └───┴───────┘ + + Merge rows manually. + + .. code-block:: + + OPTIMIZE TABLE testTable + + Query data in the **testTable** table again. + + .. code-block:: + + SELECT * FROM testTable + ┌─id─┬─value─┐ + │ 1 │ 12 │ + │ 2 │ 11 │ + │ 3 │ 26 │ + │ 4 │ 19 │ + │ 5 │ 26 │ + │ 43 │ 5 │ + │ 88 │ 5 │ + └───┴──── ┘ + + SummingMergeTree uses the **ORDER BY** sorting keys as the condition keys to aggregate data. That is, if sorting keys are the same, data records are merged into one and the specified merged fields are aggregated. + + Data is pre-aggregated only when merging is executed in the background, and the merging execution time cannot be predicted. Therefore, it is possible that some data has been pre-aggregated and some data has not been aggregated. Therefore, the **GROUP BY** statement must be used during aggregation. + +- AggregatingMergeTree + + AggregatingMergeTree is a pre-aggregation engine used to improve aggregation performance. When merging partitions, the AggregatingMergeTree engine aggregates data based on predefined conditions, calculates data based on predefined aggregate functions, and saves the data in binary format to tables. + + **Syntax for creating a table**: + + .. code-block:: + + CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] + ( + name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1], + name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2], + ... + ) ENGINE = AggregatingMergeTree() + [PARTITION BY expr] + [ORDER BY expr] + [SAMPLE BY expr] + [TTL expr] + [SETTINGS name=value, ...] + + **Example**: + + You do not need to set the AggregatingMergeTree parameter separately. When partitions are merged, data in each partition is aggregated based on the **ORDER BY** sorting key. You can set the aggregate functions to be used and column fields to be calculated by defining the AggregateFunction type, as shown in the following example: + + .. code-block:: + + create table test_table ( + name1 String, + name2 String, + name3 AggregateFunction(uniq,String), + name4 AggregateFunction(sum,Int), + name5 DateTime + ) ENGINE = AggregatingMergeTree() + PARTITION BY toYYYYMM(name5) + ORDER BY (name1,name2) + PRIMARY KEY name1; + + When data of the AggregateFunction type is written or queried, the **\*state** and **\*merge** functions need to be called. The asterisk (``*``) indicates the aggregate functions used for defining the field type. For example, the **uniq** and **sum** functions are specified for the **name3** and **name4** fields defined in the **test_table**, respectively. Therefore, you need to call the **uniqState** and **sumState** functions and run the **INSERT** and **SELECT** statements when writing data into the table. + + .. code-block:: + + insert into test_table select '8','test1',uniqState('name1'),sumState(toInt32(100)),'2021-04-30 17:18:00'; + insert into test_table select '8','test1',uniqState('name1'),sumState(toInt32(200)),'2021-04-30 17:18:00'; + + When querying data, you need to call the corresponding functions **uniqMerge** and **sumMerge**. + + .. code-block:: + + select name1,name2,uniqMerge(name3),sumMerge(name4) from test_table group by name1,name2; + ┌─name1─┬─name2─┬─uniqMerge(name3)─┬─sumMerge(name4)─┐ + │ 8 │ test1 │ 1 │ 300 │ + └──── ┴──── ┴──────────┴───────── ┘ + + AggregatingMergeTree is more commonly used with materialized views, which are query views of other data tables at the upper layer. + +- CollapsingMergeTree + + CollapsingMergeTree defines a **Sign** field to record status of data rows. If **Sign** is **1**, the data in this row is valid. If **Sign** is **-1**, the data in this row needs to be deleted. + + **Syntax for creating a table**: + + .. code-block:: + + CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] + ( + name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1], + name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2], + ... + ) ENGINE = CollapsingMergeTree(sign) + [PARTITION BY expr] + [ORDER BY expr] + [SAMPLE BY expr] + [SETTINGS name=value, ...] + +- VersionedCollapsingMergeTree + + The VersionedCollapsingMergeTree engine adds **Version** to the table creation statement to record the mapping between a **state** row and a **cancel** row in case that rows are out of order. The rows with the same primary key, same **Version**, and opposite **Sign** will be deleted during compaction. + + **Syntax for creating a table**: + + .. code-block:: + + CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] + ( + name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1], + name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2], + ... + ) ENGINE = VersionedCollapsingMergeTree(sign, version) + [PARTITION BY expr] + [ORDER BY expr] + [SAMPLE BY expr] + [SETTINGS name=value, ...] + +- GraphiteMergeTree + + The GraphiteMergeTree engine is used to store data in the time series database Graphite. + + **Syntax for creating a table**: + + .. code-block:: + + CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] + ( + Path String, + Time DateTime, + Value , + Version + ... + ) ENGINE = GraphiteMergeTree(config_section) + [PARTITION BY expr] + [ORDER BY expr] + [SAMPLE BY expr] + [SETTINGS name=value, ...] + +Replicated*MergeTree Engines +---------------------------- + +All engines of the MergeTree family in ClickHouse prefixed with Replicated become MergeTree engines that support replicas. + +|image1| + +Replicated series engines use ZooKeeper to synchronize data. When a replicated table is created, all replicas of the same shard are synchronized based on the information registered with ZooKeeper. + +**Template for creating a Replicated engine**: + +.. code-block:: + + ENGINE = Replicated*MergeTree('Storage path in ZooKeeper','Replica name', ...) + +Two parameters need to be specified for a Replicated engine: + +- *Storage path in ZooKeeper*: specifies the path for storing table data in ZooKeeper. The path format is **/clickhouse/tables/{shard}/Database name/Table name**. +- *Replica name*: Generally, **{replica}** is used. + +For details about the example, see :ref:`Creating a ClickHouse Table `. + +Distributed Engine +------------------ + +The Distributed engine does not store any data. It serves as a transparent proxy for data shards and can automatically transmit data to each node in the cluster. Distributed tables need to work with other local data tables. Distributed tables distribute received read and write tasks to each local table where data is stored. + + +.. figure:: /_static/images/en-us_image_0000001295899964.png + :alt: **Figure 1** Working principle of the Distributed engine + + **Figure 1** Working principle of the Distributed engine + +**Template for creating a Distributed engine**: + +.. code-block:: + + ENGINE = Distributed(cluster_name, database_name, table_name, [sharding_key]) + +Parameters of a distributed table are described as follows: + +- **cluster_name**: specifies the cluster name. When a distributed table is read or written, the cluster configuration information is used to search for the corresponding ClickHouse instance node. +- **database_name**: specifies the database name. +- **table_name**: specifies the name of a local table in the database. It is used to map a distributed table to a local table. +- **sharding_key** (optional): specifies the sharding key, based on which a distributed table distributes data to each local table. + +**Example**: + +.. code-block:: + + -- Create a ReplicatedMergeTree local table named test. + CREATE TABLE default.test ON CLUSTER default_cluster_1 + ( + `EventDate` DateTime, + `id` UInt64 + ) + ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/test', '{replica}') + PARTITION BY toYYYYMM(EventDate) + ORDER BY id + + -- Create a distributed table named test_all based on the local table test. + CREATE TABLE default.test_all ON CLUSTER default_cluster_1 + ( + `EventDate` DateTime, + `id` UInt64 + ) + ENGINE = Distributed(default_cluster_1, default, test, rand()) + +**Rules for creating a distributed table**: + +- When creating a distributed table, add **ON CLUSTER** *cluster_name* to the table creation statement so that the statement can be executed once on a ClickHouse instance and then distributed to all instances in the cluster for execution. +- Generally, a distributed table is named in the following format: *Local table name*\ \_all. It forms a one-to-many mapping with local tables. Then, multiple local tables can be operated using the distributed table proxy. +- Ensure that the structure of a distributed table is the same as that of local tables. If they are inconsistent, no error is reported during table creation, but an exception may be reported during data query or insertion. + +.. |image1| image:: /_static/images/en-us_image_0000001296059804.png diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/alter_table_modifying_a_table_structure.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/alter_table_modifying_a_table_structure.rst new file mode 100644 index 0000000..16d288c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/alter_table_modifying_a_table_structure.rst @@ -0,0 +1,52 @@ +:original_name: mrs_01_24204.html + +.. _mrs_01_24204: + +ALTER TABLE: Modifying a Table Structure +======================================== + +This section describes the basic syntax and usage of the SQL statement for modifying a table structure in ClickHouse. + +Basic Syntax +------------ + +**ALTER TABLE** [*database_name*].\ *name* [**ON CLUSTER** *cluster*] **ADD**\ \|\ **DROP**\ \|\ **CLEAR**\ \|\ **COMMENT**\ \|\ **MODIFY** **COLUMN** ... + +.. note:: + + **ALTER** supports only ``*``\ MergeTree, Merge, and Distributed engine tables. + +Example +------- + +.. code-block:: + + -- Add the test01 column to the t1 table. + ALTER TABLE t1 ADD COLUMN test01 String DEFAULT 'defaultvalue'; + -- Query the modified table t1. + desc t1 + ┌─name────┬─type─┬─default_type─┬─default_expression ┬─comment─┬─codec_expression─┬─ttl_expression─┐ + │ id │ UInt8 │ │ │ │ │ │ + │ name │ String │ │ │ │ │ │ + │ address │ String │ │ │ │ │ │ + │ test01 │ String │ DEFAULT │ 'defaultvalue' │ │ │ │ + └───────┴────┴────────┴────────── ┴───── ┴──────────┴─────────┘ + -- Change the type of the name column in the t1 table to UInt8. + ALTER TABLE t1 MODIFY COLUMN name UInt8; + -- Query the modified table t1. + desc t1 + ┌─name────┬─type─┬─default_type─┬─default_expression ┬─comment─┬─codec_expression─┬─ttl_expression─┐ + │ id │ UInt8 │ │ │ │ │ │ + │ name │ UInt8 │ │ │ │ │ │ + │ address │ String │ │ │ │ │ │ + │ test01 │ String │ DEFAULT │ 'defaultvalue' │ │ │ │ + └───────┴────┴────────┴────────── ┴───── ┴──────────┴─────────┘ + -- Delete the test01 column from the t1 table. + ALTER TABLE t1 DROP COLUMN test01; + -- Query the modified table t1. + desc t1 + ┌─name────┬─type─┬─default_type─┬─default_expression ┬─comment─┬─codec_expression─┬─ttl_expression─┐ + │ id │ UInt8 │ │ │ │ │ │ + │ name │ UInt8 │ │ │ │ │ │ + │ address │ String │ │ │ │ │ │ + └───────┴────┴────────┴────────── ┴───── ┴──────────┴─────────┘ diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/create_database_creating_a_database.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/create_database_creating_a_database.rst new file mode 100644 index 0000000..c819484 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/create_database_creating_a_database.rst @@ -0,0 +1,36 @@ +:original_name: mrs_01_24200.html + +.. _mrs_01_24200: + +CREATE DATABASE: Creating a Database +==================================== + +This section describes the basic syntax and usage of the SQL statement for creating a ClickHouse database. + +Basic Syntax +------------ + +**CREATE DATABASE [IF NOT EXISTS]** *Database_name* **[ON CLUSTER** *ClickHouse cluster name*\ **]** + +*ClickHouse cluster name* is **default_cluster** by default. + +.. note:: + + The syntax **ON CLUSTER** *ClickHouse cluster name* enables the Data Definition Language (DDL) statement to be executed on all instances in the cluster at a time. You can run the following statement to obtain the cluster name from the **cluster** field: + + **select cluster,shard_num,replica_num,host_name from system.clusters;** + +Example +------- + +.. code-block:: + + -- Create a database named test. + CREATE DATABASE test ON CLUSTER default_cluster; + -- After the creation is successful, run the query command for verification. + show databases; + ┌─name───┐ + │ default │ + │ system │ + │ test │ + └──────┘ diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/create_table_creating_a_table.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/create_table_creating_a_table.rst new file mode 100644 index 0000000..2938180 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/create_table_creating_a_table.rst @@ -0,0 +1,60 @@ +:original_name: mrs_01_24201.html + +.. _mrs_01_24201: + +CREATE TABLE: Creating a Table +============================== + +This section describes the basic syntax and usage of the SQL statement for creating a ClickHouse table. + +Basic Syntax +------------ + +- Method 1: Creating a table named **table_name** in the specified **database_name** database. + + If the table creation statement does not contain **database_name**, the name of the database selected during client login is used by default. + + **CREATE TABLE [IF NOT EXISTS]** *[database_name.]table_name* **[ON CLUSTER** *ClickHouse cluster name*\ **]** + + **(** + + *name1* **[**\ *type1*\ **] [DEFAULT\|MATERIALIZED\|ALIAS** *expr1*\ **],** + + *name2* **[**\ *type2*\ **] [DEFAULT\|MATERIALIZED\|ALIAS** *expr2*\ **],** + + **...** + + **)** *ENGINE* = *engine\_name()* + + [**PARTITION BY** *expr_list*] + + [**ORDER BY** *expr_list*] + + .. caution:: + + You are advised to use **PARTITION BY** to create table partitions when creating a ClickHouse table. The ClickHouse data migration tool migrates data based on table partitions. If you do not use **PARTITION BY** to create table partitions during table creation, the table data cannot be migrated on the GUI in :ref:`Using the ClickHouse Data Migration Tool `. + +- Method 2: Creating a table with the same structure as **database_name2.table_name2** and specifying a different table engine for the table + + If no table engine is specified, the created table uses the same table engine as **database_name2.table_name2**. + + **CREATE TABLE [IF NOT EXISTS]** *[database_name.]table_name* **AS** [*database_name2*.]\ *table_name*\ 2 [ENGINE = *engine\_name*] + +- Method 3: Using the specified engine to create a table with the same structure as the result of the **SELECT** clause and filling it with the result of the **SELECT** clause + + **CREATE TABLE [IF NOT EXISTS]** *[database_name.]table_name* *ENGINE* = *engine\_name* **AS SELECT** ... + +Example +------- + +.. code-block:: + + -- Create a table named test in the default database and default_cluster cluster. + CREATE TABLE default.test ON CLUSTER default_cluster + ( + `EventDate` DateTime, + `id` UInt64 + ) + ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/test', '{replica}') + PARTITION BY toYYYYMM(EventDate) + ORDER BY id diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/desc_querying_a_table_structure.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/desc_querying_a_table_structure.rst new file mode 100644 index 0000000..be73779 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/desc_querying_a_table_structure.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_24205.html + +.. _mrs_01_24205: + +DESC: Querying a Table Structure +================================ + +This section describes the basic syntax and usage of the SQL statement for querying a table structure in ClickHouse. + +Basic Syntax +------------ + +**DESC**\ \|\ **DESCRIBE** **TABLE** [*database_name*.]\ *table* [**INTO** OUTFILE filename] [FORMAT format] + +Example +------- + +.. code-block:: + + -- Query the t1 table structure. + desc t1; + ┌─name────┬─type─┬─default_type─┬─default_expression ┬─comment─┬─codec_expression─┬─ttl_expression─┐ + │ id │ UInt8 │ │ │ │ │ │ + │ name │ UInt8 │ │ │ │ │ │ + │ address │ String │ │ │ │ │ │ + └───────┴────┴────────┴────────── ┴───── ┴──────────┴─────────┘ diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/drop_deleting_a_table.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/drop_deleting_a_table.rst new file mode 100644 index 0000000..f74d001 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/drop_deleting_a_table.rst @@ -0,0 +1,21 @@ +:original_name: mrs_01_24208.html + +.. _mrs_01_24208: + +DROP: Deleting a Table +====================== + +This section describes the basic syntax and usage of the SQL statement for deleting a ClickHouse table. + +Basic Syntax +------------ + +**DROP** [**TEMPORARY**] **TABLE** [**IF EXISTS**] [*database_name*.]\ *name* [**ON CLUSTER** *cluster*] + +Example +------- + +.. code-block:: + + -- Delete the t1 table. + drop t1; diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/importing_and_exporting_file_data.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/importing_and_exporting_file_data.rst new file mode 100644 index 0000000..eb5eb30 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/importing_and_exporting_file_data.rst @@ -0,0 +1,99 @@ +:original_name: mrs_01_24206.html + +.. _mrs_01_24206: + +Importing and Exporting File Data +================================= + +This section describes the basic syntax and usage of the SQL statement for importing and exporting file data in ClickHouse. + +- Importing data in CSV format + + **clickhouse client --host** *Host name or IP address of the ClickHouse instance* **--database** *Database name* **--port** *Port number* **--secure --format_csv_delimiter="**\ *CSV file delimiter*\ **" --query="INSERT INTO** *Table name* **FORMAT CSV" <** *Host path where the CSV file is stored* + + Example + + .. code-block:: + + clickhouse client --host 10.5.208.5 --database testdb --port 21427 --secure --format_csv_delimiter="," --query="INSERT INTO testdb.csv_table FORMAT CSV" < /opt/data.csv + + You need to create a table in advance. + +- Exporting data in CSV format + + **clickhouse client --host** *Host name or IP address of the ClickHouse instance* **--database** *Database name* **--port** *Port number* **-m --secure --query=**"SELECT \* **FROM** *Table name*" > *CSV file export path* + + Example + + .. code-block:: + + clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="SELECT * FROM test_table" > /opt/test.csv + +- Importing data in Parquet format + + **cat** *Parquet file* **\| clickhouse client --host** *Host name or IP address of the ClickHouse instance* **--database** *Database name* **--port** *Port number* **-m --secure --query="INSERT INTO** *Table name* **FORMAT Parquet"** + + Example + + .. code-block:: + + cat /opt/student.parquet | clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="INSERT INTO parquet_tab001 FORMAT Parquet" + +- Exporting data in Parquet format + + **clickhouse client --host** *Host name or IP address of the ClickHouse instance* **--database** *Database name* **--port** *Port number* **-m --secure --query=**"**select** \* **from** *Table name* **FORMAT Parquet**" > *Parquet file export path* + + Example + + .. code-block:: + + clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="select * from test_table FORMAT Parquet" > /opt/student.parquet + +- Importing data in ORC format + + **cat** *ORC file path* **\| clickhouse client --host** *Host name or IP address of the ClickHouse instance* **--database** *Database name* **--port** *Port number* **-m --secure --query=**"**INSERT INTO** *Table name* **FORMAT ORC**" + + Example + + .. code-block:: + + cat /opt/student.orc | clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="INSERT INTO orc_tab001 FORMAT ORC" + # Data in the ORC file can be exported from HDFS. For example: + hdfs dfs -cat /user/hive/warehouse/hivedb.db/emp_orc/000000_0_copy_1 | clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="INSERT INTO orc_tab001 FORMAT ORC" + +- Exporting data in ORC format + + **clickhouse client --host** *Host name or IP address of the ClickHouse instance* **--database** *Database name* **--port** *Port number* **-m** **--secure --query=**"**select** \* **from** *Table name* **FORMAT ORC**" > *ORC file export path* + + Example + + .. code-block:: + + clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="select * from csv_tab001 FORMAT ORC" > /opt/student.orc + +- Importing data in JSON format + + **INSERT INTO** *Table name* **FORMAT JSONEachRow** *JSON string* *1* *JSON string 2* + + Example + + .. code-block:: + + INSERT INTO test_table001 FORMAT JSONEachRow {"PageViews":5, "UserID":"4324182021466249494", "Duration":146,"Sign":-1} {"UserID":"4324182021466249494","PageViews":6,"Duration":185,"Sign":1} + +- Exporting data in JSON format + + **clickhouse client --host** *Host name or IP address of the ClickHouse instance* **--database** *Database name* **--port** *Port number* **-m --secure --query=**"**SELECT** \* **FROM** *Table name* **FORMAT JSON|JSONEachRow|JSONCompact|...**" > *JSON file export path* + + Example + + .. code-block:: + + # Export JSON file. + clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="SELECT * FROM test_table FORMAT JSON" > /opt/test.json + + # Export json(JSONEachRow). + clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="SELECT * FROM test_table FORMAT JSONEachRow" > /opt/test_jsoneachrow.json + + # Export json(JSONCompact). + clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="SELECT * FROM test_table FORMAT JSONCompact" > /opt/test_jsoncompact.json diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/index.rst new file mode 100644 index 0000000..8cb7011 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/index.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_24199.html + +.. _mrs_01_24199: + +Common ClickHouse SQL Syntax +============================ + +- :ref:`CREATE DATABASE: Creating a Database ` +- :ref:`CREATE TABLE: Creating a Table ` +- :ref:`INSERT INTO: Inserting Data into a Table ` +- :ref:`SELECT: Querying Table Data ` +- :ref:`ALTER TABLE: Modifying a Table Structure ` +- :ref:`DESC: Querying a Table Structure ` +- :ref:`DROP: Deleting a Table ` +- :ref:`SHOW: Displaying Information About Databases and Tables ` +- :ref:`Importing and Exporting File Data ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + create_database_creating_a_database + create_table_creating_a_table + insert_into_inserting_data_into_a_table + select_querying_table_data + alter_table_modifying_a_table_structure + desc_querying_a_table_structure + drop_deleting_a_table + show_displaying_information_about_databases_and_tables + importing_and_exporting_file_data diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/insert_into_inserting_data_into_a_table.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/insert_into_inserting_data_into_a_table.rst new file mode 100644 index 0000000..f478ae1 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/insert_into_inserting_data_into_a_table.rst @@ -0,0 +1,33 @@ +:original_name: mrs_01_24202.html + +.. _mrs_01_24202: + +INSERT INTO: Inserting Data into a Table +======================================== + +This section describes the basic syntax and usage of the SQL statement for inserting data to a table in ClickHouse. + +Basic Syntax +------------ + +- Method 1: Inserting data in standard format + + **INSERT INTO** *[database_name.]table* [(*c1, c2, c3*)] **VALUES** (*v11, v12, v13*), (*v21, v22, v23*), ... + +- Method 2: Using the **SELECT** result to insert data + + **INSERT INTO** *[database_name.]table* [(c1, c2, c3)] **SELECT** ... + +Example +------- + +.. code-block:: + + -- Insert data into the test2 table. + insert into test2 (id, name) values (1, 'abc'), (2, 'bbbb'); + -- Query data in the test2 table. + select * from test2; + ┌─id─┬─name─┐ + │ 1 │ abc │ + │ 2 │ bbbb │ + └───┴────┘ diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/select_querying_table_data.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/select_querying_table_data.rst new file mode 100644 index 0000000..61874f3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/select_querying_table_data.rst @@ -0,0 +1,70 @@ +:original_name: mrs_01_24203.html + +.. _mrs_01_24203: + +SELECT: Querying Table Data +=========================== + +This section describes the basic syntax and usage of the SQL statement for querying table data in ClickHouse. + +Basic Syntax +------------ + +**SELECT** [**DISTINCT**] expr_list + +[**FROM** [*database_name*.]\ *table* \| (subquery) \| table_function] [**FINAL**] + +[SAMPLE sample_coeff] + +[ARRAY **JOIN** ...] + +[**GLOBAL**] [**ANY**\ \|\ **ALL**\ \|\ **ASOF**] [**INNER**\ \|\ **LEFT**\ \|\ **RIGHT**\ \|\ **FULL**\ \|\ **CROSS**] [**OUTER**\ \|SEMI|ANTI] **JOIN** (subquery)\|\ **table** (**ON** )|(**USING** ) + +[PREWHERE expr] + +[**WHERE** expr] + +[**GROUP BY** expr_list] [**WITH** TOTALS] + +[**HAVING** expr] + +[**ORDER BY** expr_list] [**WITH** FILL] [**FROM** expr] [**TO** expr] [STEP expr] + +[**LIMIT** [offset_value, ]n **BY** columns] + +[**LIMIT** [n, ]m] [**WITH** TIES] + +[**UNION ALL** ...] + +[**INTO** OUTFILE filename] + +[FORMAT format] + +Example +------- + +.. code-block:: + + -- View ClickHouse cluster information. + select * from system.clusters; + -- View the macros set for the current node. + select * from system.macros; + -- Check the database capacity. + select + sum(rows) as "Total number of rows", + formatReadableSize(sum(data_uncompressed_bytes)) as "Original size", + formatReadableSize(sum(data_compressed_bytes)) as "Compression size", + round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes) * 100, + 0) "Compression rate" + from system.parts; + -- Query the capacity of the test table. Add or modify the where clause based on the site requirements. + select + sum(rows) as "Total number of rows", + formatReadableSize(sum(data_uncompressed_bytes)) as "Original size", + formatReadableSize(sum(data_compressed_bytes)) as "Compression size", + round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes) * 100, + 0) "Compression rate" + from system.parts + where table in ('test') + and partition like '2020-11-%' + group by table; diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/show_displaying_information_about_databases_and_tables.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/show_displaying_information_about_databases_and_tables.rst new file mode 100644 index 0000000..33ac41e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/show_displaying_information_about_databases_and_tables.rst @@ -0,0 +1,36 @@ +:original_name: mrs_01_24207.html + +.. _mrs_01_24207: + +SHOW: Displaying Information About Databases and Tables +======================================================= + +This section describes the basic syntax and usage of the SQL statement for displaying information about databases and tables in ClickHouse. + +Basic Syntax +------------ + +**show databases** + +**show tables** + +Example +------- + +.. code-block:: + + -- Query database information. + show databases; + ┌─name────┐ + │ default │ + │ system │ + │ test │ + └───────┘ + -- Query table information. + show tables; + ┌─name──┐ + │ t1 │ + │ test │ + │ test2 │ + │ test5 │ + └─────┘ diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/creating_a_clickhouse_table.rst b/doc/component-operation-guide-lts/source/using_clickhouse/creating_a_clickhouse_table.rst new file mode 100644 index 0000000..15f7f1a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/creating_a_clickhouse_table.rst @@ -0,0 +1,319 @@ +:original_name: mrs_01_2398.html + +.. _mrs_01_2398: + +Creating a ClickHouse Table +=========================== + +ClickHouse implements the replicated table mechanism based on the ReplicatedMergeTree engine and ZooKeeper. When creating a table, you can specify an engine to determine whether the table is highly available. Shards and replicas of each table are independent of each other. + +ClickHouse also implements the distributed table mechanism based on the Distributed engine. Views are created on all shards (local tables) for distributed query, which is easy to use. ClickHouse has the concept of data sharding, which is one of the features of distributed storage. That is, parallel read and write are used to improve efficiency. + +The ClickHouse cluster table engine that uses Kunpeng as the CPU architecture does not support HDFS and Kafka. + +.. _mrs_01_2398__en-us_topic_0000001219029067_section1386435625: + +Viewing cluster and Other Environment Parameters of ClickHouse +-------------------------------------------------------------- + +#. Use the ClickHouse client to connect to the ClickHouse server by referring to :ref:`Using ClickHouse from Scratch `. + +#. .. _mrs_01_2398__en-us_topic_0000001219029067_li5153155032517: + + Query the cluster identifier and other information about the environment parameters. + + **select cluster,shard_num,replica_num,host_name from system.clusters;** + + .. code-block:: + + SELECT + cluster, + shard_num, + replica_num, + host_name + FROM system.clusters + + ┌─cluster───────────┬─shard_num─┬─replica_num─┬─host_name──────── ┐ + │ default_cluster_1 │ 1 │ 1 │ node-master1dOnG │ + │ default_cluster_1 │ 1 │ 2 │ node-group-1tXED0001 │ + │ default_cluster_1 │ 2 │ 1 │ node-master2OXQS │ + │ default_cluster_1 │ 2 │ 2 │ node-group-1tXED0002 │ + │ default_cluster_1 │ 3 │ 1 │ node-master3QsRI │ + │ default_cluster_1 │ 3 │ 2 │ node-group-1tXED0003 │ + └─────────────── ┴────── ┴─────── ┴──────────────┘ + + 6 rows in set. Elapsed: 0.001 sec. + +#. Query the shard and replica identifiers. + + **select \* from system.macros**; + + .. code-block:: + + SELECT * + FROM system.macros + + ┌─macro───┬─substitution─────┐ + │ id │ 76 │ + │ replica │ node-master3QsRI │ + │ shard │ 3 │ + └────── ┴────────────┘ + + 3 rows in set. Elapsed: 0.001 sec. + +.. _mrs_01_2398__en-us_topic_0000001219029067_section1564103819477: + +Creating a Local Replicated Table and a distributed Table +--------------------------------------------------------- + +#. Log in to the ClickHouse node using the client, for example, **clickhouse client --host** *node-master3QsRI* **--multiline --port 9440 --secure;** + + .. note:: + + *node-master3QsRI* is the value of **host_name** obtained in :ref:`2 ` in :ref:`Viewing cluster and Other Environment Parameters of ClickHouse `. + +#. .. _mrs_01_2398__en-us_topic_0000001219029067_li89698281356: + + Create a replicated table using the ReplicatedMergeTree engine. + + For details about the syntax, see https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/replication/#creating-replicated-tables. + + For example, run the following commands to create a ReplicatedMergeTree table named **test** on the **default_cluster_1** node and in the **default** database: + + **CREATE TABLE** *default.test* **ON CLUSTER** *default_cluster_1* + + **(** + + **\`EventDate\` DateTime,** + + **\`id\` UInt64** + + **)** + + **ENGINE = ReplicatedMergeTree('**\ */clickhouse/tables/{shard}/default/test*\ **', '**\ *{replica}*'**)** + + **PARTITION BY toYYYYMM(EventDate)** + + **ORDER BY id;** + + The parameters are described as follows: + + - The **ON CLUSTER** syntax indicates the distributed DDL, that is, the same local table can be created on all instances in the cluster after the statement is executed once. + - **default_cluster_1** is the cluster identifier obtained in :ref:`2 ` in :ref:`Viewing cluster and Other Environment Parameters of ClickHouse `. + + .. caution:: + + **ReplicatedMergeTree** engine receives the following two parameters: + + - Storage path of the table data in ZooKeeper + + The path must be in the **/clickhouse** directory. Otherwise, data insertion may fail due to insufficient ZooKeeper quota. + + To avoid data conflict between different tables in ZooKeeper, the directory must be in the following format: + + */clickhouse/tables/{shard}*\ **/**\ *default/test*, in which **/clickhouse/tables/{shard}** is fixed, *default* indicates the database name, and *text* indicates the name of the created table. + + - Replica name: Generally, **{replica}** is used. + + .. code-block:: + + CREATE TABLE default.test ON CLUSTER default_cluster_1 + ( + `EventDate` DateTime, + `id` UInt64 + ) + ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/test', '{replica}') + PARTITION BY toYYYYMM(EventDate) + ORDER BY id + + ┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ + │ node-group-1tXED0002 │ 9000 │ 0 │ │ 5 │ 3 │ + │ node-group-1tXED0003 │ 9000 │ 0 │ │ 4 │ 3 │ + │ node-master1dOnG │ 9000 │ 0 │ │ 3 │ 3 │ + └────────────────────┴────┴─────┴──── ┴─────────── ┴──────────┘ + ┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ + │ node-master3QsRI │ 9000 │ 0 │ │ 2 │ 0 │ + │ node-group-1tXED0001 │ 9000 │ 0 │ │ 1 │ 0 │ + │ node-master2OXQS │ 9000 │ 0 │ │ 0 │ 0 │ + └────────────────────┴────┴─────┴──── ┴─────────── ┴──────────┘ + + 6 rows in set. Elapsed: 0.189 sec. + +#. .. _mrs_01_2398__en-us_topic_0000001219029067_li16616143173215: + + Create a distributed table using the Distributed engine. + + For example, run the following commands to create a distributed table named **test_all** on the **default_cluster_1** node and in the **default** database: + + **CREATE TABLE** *default.test_all* **ON CLUSTER** *default_cluster_1* + + **(** + + **\`EventDate\` DateTime,** + + **\`id\` UInt64** + + **)** + + **ENGINE = Distributed(**\ *default_cluster_1, default, test, rand()*\ **);** + + .. code-block:: + + CREATE TABLE default.test_all ON CLUSTER default_cluster_1 + ( + `EventDate` DateTime, + `id` UInt64 + ) + ENGINE = Distributed(default_cluster_1, default, test, rand()) + + ┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ + │ node-group-1tXED0002 │ 9000 │ 0 │ │ 5 │ 0 │ + │ node-master3QsRI │ 9000 │ 0 │ │ 4 │ 0 │ + │ node-group-1tXED0003 │ 9000 │ 0 │ │ 3 │ 0 │ + │ node-group-1tXED0001 │ 9000 │ 0 │ │ 2 │ 0 │ + │ node-master1dOnG │ 9000 │ 0 │ │ 1 │ 0 │ + │ node-master2OXQS │ 9000 │ 0 │ │ 0 │ 0 │ + └────────────────────┴────┴─────┴──── ┴─────────── ┴──────────┘ + + 6 rows in set. Elapsed: 0.115 sec. + + .. note:: + + **Distributed** requires the following parameters: + + - **default_cluster_1** is the cluster identifier obtained in :ref:`2 ` in :ref:`Viewing cluster and Other Environment Parameters of ClickHouse `. + + - **default** indicates the name of the database where the local table is located. + + - **test** indicates the name of the local table. In this example, it is the name of the table created in :ref:`2 `. + + - (Optional) Sharding key + + This key and the weight configured in the **config.xml** file determine the route for writing data to the distributed table, that is, the physical table to which the data is written. It can be the original data (for example, **site_id**) of a column in the table or the result of the function call, for example, **rand()** is used in the preceding SQL statement. Note that data must be evenly distributed in this key. Another common operation is to use the hash value of a column with a large difference, for example, **intHash64(user_id)**. + +ClickHouse Table Data Operations +-------------------------------- + +#. Log in to the ClickHouse node on the client. For example, + + **clickhouse client --host** *node-master3QsRI* **--multiline --port 9440 --secure;** + + .. note:: + + *node-master3QsRI* is the value of **host_name** obtained in :ref:`2 ` in :ref:`Viewing cluster and Other Environment Parameters of ClickHouse `. + +#. .. _mrs_01_2398__en-us_topic_0000001219029067_li77990531075: + + After creating a table by referring to :ref:`Creating a Local Replicated Table and a distributed Table `, you can insert data to the local table. + + For example, run the following command to insert data to the local table **test**: + + **insert into test values(toDateTime(now()), rand());** + +#. Query the local table information. + + For example, run the following command to query data information of the table **test** in :ref:`2 `: + + **select \* from test;** + + .. code-block:: + + SELECT * + FROM test + + ┌───────────EventDate─┬─────────id─┐ + │ 2020-11-05 21:10:42 │ 1596238076 │ + └──────────────── ┴───────────┘ + + 1 rows in set. Elapsed: 0.002 sec. + +#. Query the distributed table. + + For example, the distributed table **test_all** is created based on table **test** in :ref:`3 `. Therefore, the same data in table **test** can also be queried in table **test_all**. + + **select \* from test_all;** + + .. code-block:: + + SELECT * + FROM test_all + + ┌───────────EventDate─┬─────────id─┐ + │ 2020-11-05 21:10:42 │ 1596238076 │ + └──────────────── ┴───────────┘ + + 1 rows in set. Elapsed: 0.004 sec. + +#. Switch to the shard node with the same **shard_num** and query the information about the current table. The same table data can be queried. + + For example, run the **exit;** command to exit the original node. + + Run the following command to switch to the **node-group-1tXED0003** node: + + **clickhouse client --host** *node-group-1tXED0003* **--multiline --port 9440 --secure;** + + .. note:: + + The **shard_num** values of **node-group-1tXED0003** and **node-master3QsRI** are the same by performing :ref:`2 `. + + **show tables;** + + .. code-block:: + + SHOW TABLES + + ┌─name─────┐ + │ test │ + │ test_all │ + └────────┘ + +#. Query the local table data. For example, run the following command to query data in table **test** on the **node-group-1tXED0003** node: + + **select \* from test;** + + .. code-block:: + + SELECT * + FROM test + + ┌───────────EventDate─┬─────────id─┐ + │ 2020-11-05 21:10:42 │ 1596238076 │ + └──────────────── ┴───────────┘ + + 1 rows in set. Elapsed: 0.005 sec. + +#. Switch to the shard node with different **shard_num** value and query the data of the created table. + + For example, run the following command to exit the **node-group-1tXED0003** node: + + **exit;** + + Switch to the **node-group-1tXED0001** node. The **shard_num** values of **node-group-1tXED0001** and **node-master3QsRI** are different by performing :ref:`2 `. + + **clickhouse client --host** *node-group-1tXED0001* **--multiline --port 9440 --secure;** + + Query the local table **test**. Data cannot be queried on the different shard node because table **test** is a local table. + + **select \* from test;** + + .. code-block:: + + SELECT * + FROM test + + Ok. + + Query data in the distributed table **test_all**. The data can be queried properly. + + **select \* from test_all;** + + .. code-block:: + + SELECT * + FROM test + + ┌───────────EventDate─┬─────────id─┐ + │ 2020-11-05 21:12:19 │ 3686805070 │ + └──────────────── ┴───────────┘ + + 1 rows in set. Elapsed: 0.002 sec. diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/index.rst new file mode 100644 index 0000000..fe6f5e6 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/index.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_2344.html + +.. _mrs_01_2344: + +Using ClickHouse +================ + +- :ref:`Using ClickHouse from Scratch ` +- :ref:`Common ClickHouse SQL Syntax ` +- :ref:`User Management and Authentication ` +- :ref:`ClickHouse Table Engine Overview ` +- :ref:`Creating a ClickHouse Table ` +- :ref:`Using the ClickHouse Data Migration Tool ` +- :ref:`Monitoring of Slow ClickHouse Query Statements and Replication Table Data Synchronization ` +- :ref:`Adaptive MV Usage in ClickHouse ` +- :ref:`ClickHouse Log Overview ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + using_clickhouse_from_scratch + common_clickhouse_sql_syntax/index + user_management_and_authentication/index + clickhouse_table_engine_overview + creating_a_clickhouse_table + using_the_clickhouse_data_migration_tool + monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/index + adaptive_mv_usage_in_clickhouse + clickhouse_log_overview diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/index.rst new file mode 100644 index 0000000..45fc15e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_24229.html + +.. _mrs_01_24229: + +Monitoring of Slow ClickHouse Query Statements and Replication Table Data Synchronization +========================================================================================= + +- :ref:`Slow Query Statement Monitoring ` +- :ref:`Replication Table Data Synchronization Monitoring ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + slow_query_statement_monitoring + replication_table_data_synchronization_monitoring diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/replication_table_data_synchronization_monitoring.rst b/doc/component-operation-guide-lts/source/using_clickhouse/monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/replication_table_data_synchronization_monitoring.rst new file mode 100644 index 0000000..8035756 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/replication_table_data_synchronization_monitoring.rst @@ -0,0 +1,47 @@ +:original_name: mrs_01_24231.html + +.. _mrs_01_24231: + +Replication Table Data Synchronization Monitoring +================================================= + +Scenario +-------- + +MRS monitors the synchronization between multiple copies of data in the same shard of a Replicated*MergeTree table. + +Constraints +----------- + +Currently, you can monitor and query only Replicated*MergeTree tables whose creation statements contain the key word **ON CLUSTER**. + +Replication Table Data Synchronization +-------------------------------------- + +- **Procedure** + + Log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse**. On the displayed page, click the **Data Synchronization Status** tab. + +- **Data synchronization parameters** + + .. table:: **Table 1** Data synchronization parameters + + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+=============================================================================================================================+ + | Data Tables | Names of Replicated*MergeTree tables. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + | Shard | ClickHouse shard where the data table is located. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + | Status | Data synchronization status. The options are as follows: | + | | | + | | - **No data**: The table has no data on this shard. | + | | - **Synchronized**: The table has data on this shard, and multiple instance copies of the same shard are the same. | + | | - **Not synchronized**: The table has data on this shard, but multiple instance copies of the same shard are not the same. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + | Details | Data synchronization details of the data table on the corresponding ClickHouseServer instance. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + +- **Filter conditions** + + You can select **By Data Tables** and enter a data table name in the search box to filter data tables. diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/slow_query_statement_monitoring.rst b/doc/component-operation-guide-lts/source/using_clickhouse/monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/slow_query_statement_monitoring.rst new file mode 100644 index 0000000..9d9e19f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/slow_query_statement_monitoring.rst @@ -0,0 +1,85 @@ +:original_name: mrs_01_24230.html + +.. _mrs_01_24230: + +Slow Query Statement Monitoring +=============================== + +Scenario +-------- + +The SQL statement query in ClickHouse is slow because the conditions such as partitions, where conditions, and indexes of SQL statements are set improperly. As a result, the overall performance of the database is affected. To solve this problem, MRS provides the function of monitoring slow ClickHouse query statements. + +Ongoing Slow Queries +-------------------- + +You can query information about slow SQL statements that are being executed but do not return any result. + +- **Procedure** + + Log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse**. On the displayed page, click the **Query Management** tab and then the **Ongoing Slow Queries** tab. + +- **Parameters** + + .. _mrs_01_24230__en-us_topic_0000001173470902_table1578619528431: + + .. table:: **Table 1** Slow query parameters + + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +========================+=================================================================================================================================================================================================================+ + | Server Node IP Address | IP address of the ClickHouseServer instance. To view the IP address, log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse**. On the displayed page, click the **Instance** tab. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Query ID | Unique ID generated internally. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Query | Slow query SQL statement. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Start Time | Time when the execution of a slow query SQL statement starts. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | End Time | Time when the execution of a slow query SQL statement ends. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Duration (s) | Total execution time of a slow query SQL statement, in seconds. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | User | ClickHouse user who executes a slow query SQL statement. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Client IP Address | IP address of the client that submits a slow query SQL statement. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Memory Used (MB) | Memory used by a slow query SQL statement, in MB. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Operation | You can click **Terminate** to terminate the slow query using a slow query SQL statement. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Filter conditions** + + Select the query condition as required and filter the query results. + + |image1| + + .. _mrs_01_24230__en-us_topic_0000001173470902_table12626134121116: + + .. table:: **Table 2** Filter conditions + + +-----------------------------------+--------------------------------------------------------------------------------------------------------+ + | Condition | Description | + +===================================+========================================================================================================+ + | Slow query duration exceeding | Filters the slow queries based on the duration. | + | | | + | | The value can be **3 (s)**, **9 (s)**, **15 (s)**, or **25 (s)**. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------+ + | By Query ID | Filters the slow queries based on the query ID. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------+ + | By User | Filters the slow queries based on the ClickHouse user. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------+ + | By Client IP Address | Filters the slow queries based on the IP address of the client that submits slow query SQL statements. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------+ + +Completed Queries +----------------- + +You can query information about slow SQL statements that have been executed and returned results. + +Log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse**. On the displayed page, click the **Query Management** tab and then the **Completed Queries** tab. + +For details about slow query parameters and filter conditions, see :ref:`Table 1 ` and :ref:`Table 2 `, respectively. + +.. |image1| image:: /_static/images/en-us_image_0000001441092221.png diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/clickhouse_user_and_permission_management.rst b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/clickhouse_user_and_permission_management.rst new file mode 100644 index 0000000..5c96eba --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/clickhouse_user_and_permission_management.rst @@ -0,0 +1,188 @@ +:original_name: mrs_01_24057.html + +.. _mrs_01_24057: + +ClickHouse User and Permission Management +========================================= + +User Permission Model +--------------------- + +ClickHouse user permission management enables unified management of users, roles, and permissions on each ClickHouse instance in the cluster. You can use the permission management module of the Manager UI to create users, create roles, and bind the ClickHouse access permissions. User permissions are controlled by binding roles to users. + +Resource management: :ref:`Table 1 ` lists the resources supported by ClickHouse permission management. + +Resource permissions: :ref:`Table 2 ` lists the resource permissions supported by ClickHouse. + +.. _mrs_01_24057__en-us_topic_0000001219230659_table858112220269: + +.. table:: **Table 1** Permission management objects supported by ClickHouse + + ======== ============= ============== + Resource Integration Remarks + ======== ============= ============== + Database Yes (level 1) ``-`` + Table Yes (level 2) ``-`` + View Yes (level 2) Same as tables + ======== ============= ============== + +.. _mrs_01_24057__en-us_topic_0000001219230659_table20282143414276: + +.. table:: **Table 2** Resource permission list + + ========== ==================== ===================================== + Resource Available Permission Remarks + ========== ==================== ===================================== + Database CREATE CREATE DATABASE/TABLE/VIEW/DICTIONARY + Table/View SELECT/INSERT ``-`` + ========== ==================== ===================================== + +Prerequisites +------------- + +- The ClickHouse and Zookeeper services are running properly. +- When creating a database or table in the cluster, the **ON CLUSTER** statement is used to ensure that the metadata of the database and table on each ClickHouse node is the same. + +.. note:: + + After the permission is granted, it takes about 1 minute for the permission to take effect. + +Adding the ClickHouse Role +-------------------------- + +#. Log in to Manager and choose **System** > **Permission** > **Role**. On the **Role** page, click **Create Role**. + + |image1| + +#. On the **Create Role** page, specify **Role Name**. In the **Configure Resource Permission** area, click the cluster name. On the service list page that is displayed, click the ClickHouse service. + + Determine whether to create a role with ClickHouse administrator permission based on service requirements. + + .. note:: + + - The ClickHouse administrator has all the database operation permissions except the permissions to create, delete, and modify users and roles. + - Only the built-in user **clickhouse** of ClickHouse has the permission to manage users and roles. + + - If yes, go to :ref:`3 `. + - If no, go to :ref:`4 `. + +#. .. _mrs_01_24057__en-us_topic_0000001219230659_li9365913184120: + + Select **SUPER_USER_GROUP** and click **OK**. + +#. .. _mrs_01_24057__en-us_topic_0000001219230659_li13347154819413: + + Click **ClickHouse Scope**. The ClickHouse database resource list is displayed. If you select **create**, the role has the create permission on the database. + + Determine whether to grant the permission based on the service requirements. + + - If yes, click **OK**. + - If no, go to :ref:`5 `. + +#. .. _mrs_01_24057__en-us_topic_0000001219230659_li17964516204412: + + Click the resource name and select the *Database resource name to be operated*. On the displayed page, select **read** (SELECT permission) or **write** (INSERT permission) based on service requirements, and click **OK**. + +Adding a User and Binding the ClickHouse Role to the User +--------------------------------------------------------- + +#. .. _mrs_01_24057__en-us_topic_0000001219230659_li1183214191540: + + Log in to Manager and choose **System** > **Permission** > **User** and click **Create**. + +#. .. _mrs_01_24057__en-us_topic_0000001219230659_li0521154115455: + + Select **Human-Machine** for **User Type** and set **Password** and **Confirm Password** to the password of the user. + + .. note:: + + - Username: The username cannot contain hyphens (-). Otherwise, the authentication will fail. + - Password: The password cannot contain special characters $, ., and #. Otherwise, the authentication will fail. + +#. In the **Role** area, click **Add**. In the displayed dialog box, select a role with the ClickHouse permission and click **OK** to add the role. Then, click **OK**. + +#. Log in to the node where the ClickHouse client is installed and use the new username and password to connect to the ClickHouse service. + + a. Run the following command to go to the client installation directory. For example, the client installation directory is /opt/Bigdata/client. + + **cd /opt/client** + + b. Run the following command to configure environment variables: + + **source bigdata_env** + + c. If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the current user. The user must have the permission to create ClickHouse tables. Therefore, you need to bind the corresponding role to the user. For details, see :ref:`ClickHouse User and Permission Management `. If Kerberos authentication is disabled for the current cluster, skip this step. + + **kinit** *Component service user* + + Example: **kinit clickhouseuser** + + d. Log in to the system as the new user. + + **clickhouse client --host** *IP address of the ClickHouse instance* **--user** *User added in :ref:`1 `* **--password** *User password set in :ref:`2 `* **--port** *ClickHouse port number* + +Granting Permissions Using the Client in Abnormal Scenarios +----------------------------------------------------------- + +By default, the table metadata on each node of the ClickHouse cluster is the same. Therefore, the table information on a random ClickHouse node is collected on the permission management page of Manager. If the **ON CLUSTER** statement is not used when databases or tables are created on some nodes, the resource may fail to be displayed during permission management, and permissions may not be granted to the resource. To grant permissions on the local table on a single ClickHouse node, perform the following steps on the background client. + +.. note:: + + The following operations are performed based on the obtained roles, database or table names, and IP addresses of the node where the corresponding ClickHouseServer instance is located. + + - You can log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse** > **Instance** to obtain the service IP address of the ClickHouseServer instance. + - System domain name: The default value is **hadoop.com**. Log in to FusionInsight Manager and choose **System** > **Permission** > **Domain and Mutual Trust**. The value of **Local Domain** is the system domain name. Change the letters to lowercase letters when running a command. + +#. Log in to the node where the ClickHouseServer instance is located as user **root**. + +#. .. _mrs_01_24057__en-us_topic_0000001219230659_li10408141903516: + + Run the following command to obtain the path of the **clickhouse.keytab** file: + + **ls ${BIGDATA_HOME}/FusionInsight_ClickHouse_*/install/FusionInsight-ClickHouse-*/clickhouse/keytab/clickhouse.keytab** + +#. Log in to the node where the client is installed as the client installation user. + +#. Run the following command to go to the client installation directory. For example, the client installation directory is /opt/Bigdata/client. + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. Run the following command to connect to the ClickHouseServer instance: + + If Kerberos authentication is enabled for the current cluster, run the following command: + + **clickhouse client --host** *IP address of the node where the ClickHouseServer instance is located* **--user clickhouse/hadoop.**\ ** **--password** *clickhouse.keytab path obtained in :ref:`2 `* **--port** *ClickHouse port number* **--secure** + + If Kerberos authentication is disabled for the current cluster, run the following command: + + **clickhouse client --host** *IP address of the node where the ClickHouseServer instance is located* **--user clickhouse** **--port** *ClickHouse port number* + +#. Run the following statement to grant permissions to a database: + + In the syntax for granting permissions, *DATABASE* indicates the name of the target database, and *role* indicates the target role. + + **GRANT** **[ON CLUSTER** *cluster_name*\ **]** *privilege* **ON** *{DATABASE|TABLE}* **TO** *{user \| role]* + + For example, grant user **testuser** the CREATE permission on database **t2**: + + **GRANT CREATE ON** *m2* **to** *testuser*\ **;** + +#. Run the following commands to grant permissions on the table or view. In the following command, *TABLE* indicates the name of the table or view to be operated, and *user* indicates the role to be operated. + + Run the following command to grant the query permission on tables in a database: + + **GRANT SELECT ON** *TABLE* **TO** *user*\ **;** + + Run the following command to grant the write permission on tables in a database: + + **GRANT INSERT ON** *TABLE* **TO** *user*\ **;** + +#. Run the following command to exit the client: + + **quit;** + +.. |image1| image:: /_static/images/en-us_image_0000001387892350.png diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/index.rst new file mode 100644 index 0000000..fa531e9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_24251.html + +.. _mrs_01_24251: + +User Management and Authentication +================================== + +- :ref:`ClickHouse User and Permission Management ` +- :ref:`Setting the ClickHouse Username and Password ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + clickhouse_user_and_permission_management + setting_the_clickhouse_username_and_password diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/setting_the_clickhouse_username_and_password.rst b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/setting_the_clickhouse_username_and_password.rst new file mode 100644 index 0000000..3117910 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/setting_the_clickhouse_username_and_password.rst @@ -0,0 +1,45 @@ +:original_name: mrs_01_2395.html + +.. _mrs_01_2395: + +Setting the ClickHouse Username and Password +============================================ + +After a ClickHouse cluster is created, you can use the ClickHouse client to connect to the ClickHouse server. The default username is **default**. + +This section describes how to set ClickHouse username and password after a ClickHouse cluster is successfully created. + +.. note:: + + **default** is the default internal user of ClickHouse. It is an administrator user available only in normal mode (kerberos authentication disabled). + + +Setting the ClickHouse Username and Password +-------------------------------------------- + +#. Log in to Manager and choose **Cluster** > **Services** > **ClickHouse**. Click the **Configurations** tab and then **All Configurations**. + +#. Search for the **users.default.password** parameter in the search box and change its password, as shown in :ref:`Figure 1 `. + + .. _mrs_01_2395__fig42238424419: + + .. figure:: /_static/images/en-us_image_0000001296059672.png + :alt: **Figure 1** Changing the default user password + + **Figure 1** Changing the default user password + +#. Log in to the node where the client is installed and run the following command to switch to the client installation directory. + + **cd**\ *Cluster client installation directory* + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. Log in to ClickHouse using the new password. + + **clickhouse client --host** *IP address of the ClickHouse instance* **--user** *default* **--password** *xxx* + + .. note:: + + To obtain the service IP address of the ClickHouse instance, choose **Components** > **ClickHouse** > **Instances** on the cluster details page. diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/using_clickhouse_from_scratch.rst b/doc/component-operation-guide-lts/source/using_clickhouse/using_clickhouse_from_scratch.rst new file mode 100644 index 0000000..cb5485d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/using_clickhouse_from_scratch.rst @@ -0,0 +1,117 @@ +:original_name: mrs_01_2345.html + +.. _mrs_01_2345: + +Using ClickHouse from Scratch +============================= + +ClickHouse is a column-based database oriented to online analysis and processing. It supports SQL query and provides good query performance. The aggregation analysis and query performance based on large and wide tables is excellent, which is one order of magnitude faster than other analytical databases. + +Prerequisites +------------- + +You have installed the client, for example, in the **/opt/hadoopclient** directory. The client directory in the following operations is only an example. Change it to the actual installation directory. Before using the client, download and update the client configuration file, and ensure that the active management node of Manager is available. + +Procedure +--------- + +#. Log in to the node where the client is installed as the client installation user. + +#. Run the following command to go to the client installation directory: + + **cd /opt/hadoopclient** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the current user. The current user must have the permission to create ClickHouse tables. For details about how to bind a role to the user, see :ref:`ClickHouse User and Permission Management `. If Kerberos authentication is disabled for the current cluster, skip this step. + + **kinit** *Component service user* + + Example: **kinit clickhouseuser** + +#. Run the client command of the ClickHouse component. + + Run the **clickhouse -h** command to view the command help of ClickHouse. + + The command output is as follows: + + .. code-block:: + + Use one of the following commands: + clickhouse local [args] + clickhouse client [args] + clickhouse benchmark [args] + clickhouse server [args] + clickhouse performance-test [args] + clickhouse extract-from-config [args] + clickhouse compressor [args] + clickhouse format [args] + clickhouse copier [args] + clickhouse obfuscator [args] + ... + + The following table describes the parameters when the **clickhouse client** command is used to connect to the ClickHouse server. + + .. table:: **Table 1** Parameters of the clickhouse client command + + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+===================================================================================================================================================================================================================================================================================================================================================================+ + | --host | Host name of the server. The default value is **localhost**. You can use the host name or IP address of the node where the ClickHouse instance is located. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --port | Port for connection. | + | | | + | | - If the SSL security connection is used, the default port number is **9440**, the parameter **--secure** must be carried. For details about the port number, search for the **tcp_port_secure** parameter in the ClickHouseServer instance configuration. | + | | - If non-SSL security connection is used, the default port number is **9000**, the parameter **--secure** does not need to be carried. For details about the port number, search for the **tcp_port** parameter in the ClickHouseServer instance configuration. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --user | Username. | + | | | + | | You can create the user on Manager and bind a role to the user. For details, see :ref:`ClickHouse User and Permission Management `. | + | | | + | | - If Kerberos authentication is enabled for the current cluster and the user authentication is successful, you do not need to carry the **--user** and **--password** parameters when logging in to the client as the authenticated user. You must create a user with this name on Manager because there is no default user in the Kerberos cluster scenario. | + | | - If Kerberos authentication is not enabled for the current cluster, you can specify a user and its password created on Manager when logging in to the client. If the user is used for the first time, you need to log in to Manager to change the password. If the user and password parameters are not carried, user **default** is used for login by default. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --password | Password. The default password is an empty string. This parameter is used together with the **--user** parameter. You can set a password when creating a user on Manager. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --query | Query to process when using non-interactive mode. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --database | Current default database. The default value is **default**, which is the default configuration on the server. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --multiline | If this parameter is specified, multiline queries are allowed. (**Enter** only indicates line feed and does not indicate that the query statement is complete.) | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --multiquery | If this parameter is specified, multiple queries separated with semicolons (;) can be processed. This parameter is valid only in non-interactive mode. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --format | Specified default format used to output the result. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --vertical | If this parameter is specified, the result is output in vertical format by default. In this format, each value is printed on a separate line, which helps to display a wide table. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --time | If this parameter is specified, the query execution time is printed to **stderr** in non-interactive mode. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --stacktrace | If this parameter is specified, stack trace information will be printed when an exception occurs. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --config-file | Name of the configuration file. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --secure | If this parameter is specified, the server will be connected in SSL mode. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --history_file | Path of files that record command history. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | --param_ | Query with parameters. Pass values from the client to the server. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + - Using SSL for login when Kerberos authentication is disabled for the current cluster: + + **clickhouse client --host** *IP address of the ClickHouse instance* **--user** *Username* **--password** *Password* **--port** 9440 **--secure** + + - Using SSL for login when Kerberos authentication is enabled for the current cluster: + + You must create a user on Manager because there is no default user. For details, see :ref:`ClickHouse User and Permission Management `. + + After the user authentication is successful, you do not need to carry the **--user** and **--password** parameters when logging in to the client as the authenticated user. + + **clickhouse client --host** *IP address of the ClickHouse instance* **--port** 9440 **--secure** + + .. note:: + + You can log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse** > **Instance** to obtain the service IP address of the ClickHouseServer instance. diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/using_the_clickhouse_data_migration_tool.rst b/doc/component-operation-guide-lts/source/using_clickhouse/using_the_clickhouse_data_migration_tool.rst new file mode 100644 index 0000000..c661941 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/using_the_clickhouse_data_migration_tool.rst @@ -0,0 +1,91 @@ +:original_name: mrs_01_24053.html + +.. _mrs_01_24053: + +Using the ClickHouse Data Migration Tool +======================================== + +The ClickHouse data migration tool can migrate some partitions of one or more partitioned MergeTree tables on several ClickHouseServer nodes to the same tables on other ClickHouseServer nodes. In the capacity expansion scenario, you can use this tool to migrate data from an original node to a new node to balance data after capacity expansion. + +Prerequisites +------------- + +- The ClickHouse and Zookeeper services are running properly. The ClickHouseServer instances on the source and destination nodes are normal. +- The destination node has the data table to be migrated and the table is a partitioned MergeTree table. +- Before creating a migration task, ensure that all tasks for writing data to a table to be migrated have been stopped. After the task is started, you can only query the table to be migrated and cannot write data to or delete data from the table. Otherwise, data may be inconsistent before and after the migration. +- If automatic balancing is enabled, only a partitioned ReplicatedMergeTree table is migrated, and the partitioned table must have a corresponding distributed table. +- The ClickHouse data directory on the destination node has sufficient space. + +Procedure +--------- + +#. Log in to Manager and choose **Cluster** > **Services** > **ClickHouse** > **Data Migration**. On the displayed page, click **Add Task**. + + .. note:: + + - The number of created migration tasks is limited. By default, a maximum of 20 migration tasks can be created. You can modify the number of migration tasks allowed by modifying the **max_migration_task_number** configuration item on the ClickHouse configuration page of Manager. A migration task occupies a certain number of Znodes on ZooKeeper. Therefore, you are not advised to set the maximum number of migration tasks allowed to a large value. + - If the number of existing migration tasks exceeds the upper limit, no more migration tasks can be created. The system automatically deletes the earliest migration tasks that have been successfully executed. If no historical migration task is successfully executed, perform :ref:`10 ` based on the site requirements and manually delete historical migration tasks. + +2. On the page for creating a migration task, set the migration task parameters. For details, see :ref:`Table 1 `. After configuring the parameters, click **Next**. If **Automatic balancing** is not enabled, go to :ref:`3 `. If **Automatic balancing** is enabled, go to :ref:`5 `. + + .. _mrs_01_24053__en-us_topic_0000001173789598_table1724256152117: + + .. table:: **Table 1** Migration task parameters + + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+================================================================================================================================================================================================================================================================================+ + | Task Name | Enter a specific task name. The value can contain 1 to 50 characters, including letters, digits, and underscores (_), and cannot be the same as that of an existing migration task. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Automatic balancing | Choose whether to enable **Automatic balancing**. | + | | | + | | - If it is enabled, the system automatically selects the source and destination nodes to ensure that data in partitioned ReplicatedMergeTree tables corresponding to the distributed tables is evenly distributed on each node in the cluster. | + | | - If it is not enabled, you need to manually select the source and destination nodes. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Task Type | - **Scheduled Task**: When the scheduled task is selected, you can set **Started** to specify a time point later than the current time to execute the task. | + | | - **Immediate task**: The task is executed immediately after it is started. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Started | Set this parameter when **Task Type** is set to **Scheduled Task**. The valid value is a time point within 90 days from now. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Maximum Bandwidth | Bandwidth upper limit of each ClickHouseServer node. The value ranges from 1 MB/s to 10,000 MB/s. In automatic balancing scenarios, increase the bandwidth as much as possible. Flow control is disabled by default. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Data Amount Migrated | Percentage of the amount of data migrated in each table to the total amount of data in the table. The value ranges from 0 to 100%. If this parameter is left blank, the value is set to 50% by default. This parameter is valid only when **Automatic balancing** is disabled. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +3. .. _mrs_01_24053__en-us_topic_0000001173789598_li117062415510: + + On the **Select Node** page, select the data table to be migrated from the list on the left and click **OK**. In the list on the right, select the source node of the selected data table and click **Next**. + + .. note:: + + After a node is selected, other nodes that are replicas of the selected node are automatically selected as the source nodes. + +4. On the **Select Data Table** page, select the host name of the destination node and click **Next**. + + .. note:: + + The destination node must be different from the source nodes. The selected source nodes are not displayed on the page. + +5. .. _mrs_01_24053__en-us_topic_0000001173789598_li16513141410554: + + Confirm the task information and click **Submit**. + + The data migration tool automatically calculates the partitions to be migrated based on the size of the data table to be migrated and the value of **Data Amount Migrated** set on the **Add Task** page. + +6. After the migration task is submitted, click **Start** in the **Operation** column. If the task is an immediate task, the task starts to be executed. If the task is a scheduled task, the countdown starts. + +7. During the migration task execution, you can click **Cancel** to cancel the migration task. If the migration task is canceled, the migrated data on the destination node will not be rolled back. + + .. caution:: + + After a task with automatic balancing enabled is canceled, it will not stop immediately. It will stop after the table migration is complete. After a task with automatic balancing disabled is canceled, the task stops immediately. After the task stops, a partition may have been migrated to the destination node, but it is not deleted from the source node. In this case, duplicate data exists. Manually check whether the migrated partition still exists on the source node. If it still exists, check that the total data volume of the partition on the destination node is the same as that on the source node, and then delete the partition from the source node. + +8. Choose **More** > **Details** to view details about the migration task. + +9. After the migration is complete, choose **More** > **Results** to view the migration result. + + In a non-automatic balancing task, you can view the migrated partitions of each table and the partition migration result. If the partition migration is not finished, the partition has been copied to the destination node but not deleted from the source node because the data volume of the partition on the source node is inconsistent with that of the partition on the destination node. In this case, check whether the data volume of the partition on the source node is consistent with that of the partition on the destination node, and then delete the partition from the source node. + +10. .. _mrs_01_24053__en-us_topic_0000001173789598_li246411286161: + + After the migration is complete, choose **More** > **Delete** to delete the directories related to the migration task on ZooKeeper and the destination node. diff --git a/doc/component-operation-guide-lts/source/using_dbservice/configuring_ssl_for_the_ha_module.rst b/doc/component-operation-guide-lts/source/using_dbservice/configuring_ssl_for_the_ha_module.rst new file mode 100644 index 0000000..0171f9c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_dbservice/configuring_ssl_for_the_ha_module.rst @@ -0,0 +1,67 @@ +:original_name: mrs_01_2346.html + +.. _mrs_01_2346: + +Configuring SSL for the HA Module +================================= + +Scenario +-------- + +This section describes how to manually configure SSL for the HA module of DBService in the cluster where DBService is installed. + +.. note:: + + After this operation is performed, if you need to restore the SSL configuration, go to :ref:`Restoring SSL for the HA Module `. + +Prerequisites +------------- + +- The cluster has been installed. +- The **root-ca.crt** and **root-ca.pem** files in the **$BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0/security** directory on the active and standby DBService nodes are the same. + +Procedure +--------- + +#. Log in to the DBService node where SSL needs to be configured as user **omm**. + +#. .. _mrs_01_2346__en-us_topic_0000001173471440_li1682110356353: + + Go to the **$BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0/sbin/** directory and run the following command: + + **./proceed_ha_ssl_cert.sh** *DBService* *installation directoryService IP address of the node* + + Example: + + **cd $BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0/sbin/** + + **./proceed_ha_ssl_cert.sh $BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0** **10.10.10.10** + + .. note:: + + **$BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0** is the installation directory of DBService. Modify it based on site requirements. + +#. Go to the **$BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0/ha/module/hacom/script/** directory and run the following command to restart the HA process: + + **./stop_ha.sh** + + **./start_ha.sh** + +#. Run the following command on the preceding node to obtain the PID of the HA process: + + **ps -ef \|grep "ha.bin" \|grep DBSERVICE** + +#. Run the following command to check whether the protocol is changed to TCP: + + **netstat -nap \| grep** *pid* **\|** **grep -v unix** + + - If yes, no further action is required. + - If no, go to :ref:`2 `. + + .. code-block:: + + (Not all processes could be identified, non-owned process info + will not be shown, you would have to be root to see it all.) + tcp 0 0 127.0.0.1:20054 0.0.0.0:* LISTEN 11896/ha.bin + tcp 0 0 10.10.10.10:20052 10.10.10.14:20052 ESTABLISHED 11896/ha.bin + tcp 0 0 10.10.10.10:20053 10.10.10.14:20053 ESTABLISHED 11896/ha.bin diff --git a/doc/component-operation-guide-lts/source/using_dbservice/configuring_the_timeout_interval_of_dbservice_backup_tasks.rst b/doc/component-operation-guide-lts/source/using_dbservice/configuring_the_timeout_interval_of_dbservice_backup_tasks.rst new file mode 100644 index 0000000..40f1e6b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_dbservice/configuring_the_timeout_interval_of_dbservice_backup_tasks.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_24283.html + +.. _mrs_01_24283: + +Configuring the Timeout Interval of DBService Backup Tasks +========================================================== + +Scenario +-------- + +The default timeout interval of DBService backup tasks is 2 hours. When the data volume in DBService is too large, the backup task may fail to be executed because the timeout interval is reached. + +This section describes how to customize the timeout interval of a DBService backup task. + +Prerequisites +------------- + +- Clusters have been properly installed. +- DBService is running properly. + +Procedure +--------- + +#. .. _mrs_01_24283__en-us_topic_0000001219029805_li140316245439: + + Log in to the active OMS node as user **omm** using PuTTY. In the **${CONTROLLER_HOME}/etc/om/controller.properties** configuration file, change the value of **controller.backup.conf.script.execute.timeout** to **10000000s** (Set the timeout interval based on the data volume of DBService). + +#. Log in to the active OMS node as user **omm** using PuTTY and repeat step :ref:`1 `. + +#. Log in to the active OMS node as user **omm** using PuTTY, run the following command to query the ID of **BackupRecoveryPluginProcess** process, and stop the process. + + **jps|grep -i BackupRecoveryPluginProcess** + + **kill -9** *Queried process ID* + +#. Log in to FusionInsight Manager and perform the DBService backup task again. + +#. Run the following command to check whether the **BackupRecoveryPluginProcess** process is started: + + **jps|grep -i BackupRecoveryPluginProcess** diff --git a/doc/component-operation-guide-lts/source/using_dbservice/dbservice_log_overview.rst b/doc/component-operation-guide-lts/source/using_dbservice/dbservice_log_overview.rst new file mode 100644 index 0000000..b717d29 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_dbservice/dbservice_log_overview.rst @@ -0,0 +1,100 @@ +:original_name: mrs_01_0789.html + +.. _mrs_01_0789: + +DBService Log Overview +====================== + +Log Description +--------------- + +**Log path**: The default storage path of DBService log files is **/var/log/Bigdata/dbservice**. + +- GaussDB: **/var/log/Bigdata/dbservice/DB** (GaussDB run log directory), **/var/log/Bigdata/dbservice/scriptlog/gaussdbinstall.log** (GaussDB installation log), and **/var/log/gaussdbuninstall.log** (GaussDB uninstallation log). + +- HA: **/var/log/Bigdata/dbservice/ha/runlog** (HA run log directory) and **/var/log/Bigdata/dbservice/ha/scriptlog** (HA script log directory) + +- DBServer: **/var/log/Bigdata/dbservice/healthCheck** (Directory of service and process health check logs) + + **/var/log/Bigdata/dbservice/scriptlog** (run log directory), **/var/log/Bigdata/audit/dbservice/** (audit log directory) + +Log archive rule: The automatic DBService log compression function is enabled. By default, when the size of logs exceeds 1 MB, logs are automatically compressed into a log file named in the following format: *-[No.]*\ **.gz**. A maximum of 20 latest compressed files are reserved. + +.. note:: + + Log archive rules cannot be modified. + +.. table:: **Table 1** DBService log list + + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | Type | Log File Name | Description | + +=======================+============================+===================================================================================================================================+ + | DBServer run log | dbservice_serviceCheck.log | Run log file of the service check script | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | dbservice_processCheck.log | Run log file of the process check script | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | backup.log | Run logs of backup and restoration operations (The DBService backup and restoration operations need to be performed.) | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | checkHaStatus.log | Log file of HA check records | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | cleanupDBService.log | Uninstallation log file (You need to uninstall DBService logs.) | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | componentUserManager.log | Log file that records the adding and deleting operations on the database by users | + | | | | + | | | (Services that depend on DBService need to be added.) | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | install.log | Installation log file | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | preStartDBService.log | Pre-startup log file | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | start_dbserver.log | DBServer startup operation log file (DBService needs to be started.) | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | stop_dbserver.log | DBServer stop operation log file (DBService needs to be stopped.) | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | status_dbserver.log | Log file of the DBServer status check (You need to execute the **$DBSERVICE_HOME/sbin/status-dbserver.sh** script.) | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | modifyPassword.log | Run log file of changing the DBService password script. (You need to execute the **$DBSERVICE_HOME/sbin/modifyDBPwd.sh** script.) | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | modifyDBPwd_yyyy-mm-dd.log | Run log file that records the DBService password change tool | + | | | | + | | | (You need to execute the **$DBSERVICE_HOME/sbin/modifyDBPwd.sh** script.) | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | dbserver_switchover.log | Log for DBServer to execute the active/standby switchover script (the active/standby switchover needs to be performed) | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | GaussDB run log | gaussdb.log | Log file that records database running information | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | gs_ctl-current.log | Log file that records operations performed by using the **gs_ctl** tool | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | gs_guc-current.log | Log file that records operations, mainly parameter modification performed by using the **gs_guc** tool | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | gaussdbinstall.log | GaussDB installation log file | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | gaussdbuninstall.log | GaussDB uninstallation log file | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | HA script run log | floatip_ha.log | Log file that records the script of floating IP addresses | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | gaussDB_ha.log | Log file that records the script of GaussDB resources | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | ha_monitor.log | Log file that records the HA process monitoring information | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | send_alarm.log | Alarm sending log file | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | | ha.log | HA run log file | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + | DBService audit log | dbservice_audit.log | Audit log file that records DBService operations, such as backup and restoration operations | + +-----------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ + +Log Format +---------- + +The following table lists the DBService log formats. + +.. table:: **Table 2** Log format + + +-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Format | Example | + +===========+=============================================================================================================================================================================+======================================================================================================================================================+ + | Run log | [<*yyyy-MM-dd HH:mm:ss*>] <*Log level*>: [< *Name of the script that generates the log*: *Line number* >]: < *Message in the log*> | [2020-12-19 15:56:42] INFO [postinstall.sh:653] Is cloud flag is false. (main) | + +-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Audit log | [<*yyyy-MM-dd HH:mm:ss,SSS*>] UserName:<*Username*> UserIP:<*User IP address*> Operation:<*Operation content*> Result:<*Operation results*> Detail:<*Detailed information*> | [2020-05-26 22:00:23] UserName:omm UserIP:192.168.10.21 Operation:DBService data backup Result: SUCCESS Detail: DBService data backup is successful. | + +-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_dbservice/index.rst b/doc/component-operation-guide-lts/source/using_dbservice/index.rst new file mode 100644 index 0000000..a036a3e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_dbservice/index.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_2356.html + +.. _mrs_01_2356: + +Using DBService +=============== + +- :ref:`Configuring SSL for the HA Module ` +- :ref:`Restoring SSL for the HA Module ` +- :ref:`Configuring the Timeout Interval of DBService Backup Tasks ` +- :ref:`DBService Log Overview ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + configuring_ssl_for_the_ha_module + restoring_ssl_for_the_ha_module + configuring_the_timeout_interval_of_dbservice_backup_tasks + dbservice_log_overview diff --git a/doc/component-operation-guide-lts/source/using_dbservice/restoring_ssl_for_the_ha_module.rst b/doc/component-operation-guide-lts/source/using_dbservice/restoring_ssl_for_the_ha_module.rst new file mode 100644 index 0000000..3504c16 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_dbservice/restoring_ssl_for_the_ha_module.rst @@ -0,0 +1,75 @@ +:original_name: mrs_01_2347.html + +.. _mrs_01_2347: + +Restoring SSL for the HA Module +=============================== + +Scenario +-------- + +This section describes how to restore SSL for the HA module of DBService in the cluster where DBService is installed. + +Prerequisites +------------- + +SSL has been enabled for the HA module of DBService. + +.. note:: + + Check whether SSL is enabled for the HA module of DBService. + + Check **$BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0/ha/module/hacom/conf/hacom.xml**. If the file contains ****, SSL is enabled. + +Procedure +--------- + +#. Log in to the DBService node where SSL needs to be restored as user **omm**. + +#. Run the following commands to restore the DBService configuration file **hacom_local.xml**: + + **cd $BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0/ha/local/hacom/conf/** + + **cp hacom_local.xml $BIGDATA_HOME/tmp/** + + **cat hacom_local.xml \| grep "ssl>" -n \| cut -d':' -f1 \| xargs \| sed 's/ /,/g' \|xargs -n 1 -i sed -i '{}d' hacom_local.xm**\ l + +#. Run the following commands to restore the DBService configuration file **hacom.xml**: + + **cd $BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0/ha/module/hacom/conf/** + + **cp hacom.xml $BIGDATA_HOME/tmp/** + + **sed -i 's##g' hacom.xml** + + **sed -i 's##g' hacom.xml** + + .. note:: + + **$BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0** is the installation directory of DBService. Modify it based on the upgrade environment. + +#. Go to the **$BIGDATA_HOME/FusionInsight_BASE\_**\ *x.x.x*\ **/install/FusionInsight-dbservice-2.7.0/ha/module/hacom/script/** directory and run the following command to restart the HA process: + + **./stop_ha.sh** + + **./start_ha.sh** + +#. Run the following command to obtain the PID of the HA process: + + **ps -ef \|grep "ha.bin" \|grep DBSERVICE** + +#. Run the following command to check whether the protocol is changed to TCP: + + **netstat -nap \| grep** *pid* **\|** **grep -v unix** + + - If yes, no further action is required. + - If no, contact O&M support. + + .. code-block:: console + + [omm@host03]\>netstat -nap | grep 49989 + (Not all processes could be identified, non-owned process info + will not be shown, you would have to be root to see it all.) + tcp 0 0 127.0.0.1:20054 0.0.0.0:* LISTEN 49989/ha.bin + udp 0 0 10.10.10.10:20052 0.0.0.0:* 49989/ha.bin + udp 0 0 10.10.10.10:20053 0.0.0.0:* 49989/ha.bin diff --git a/doc/component-operation-guide-lts/source/using_flink/common_flink_shell_commands.rst b/doc/component-operation-guide-lts/source/using_flink/common_flink_shell_commands.rst new file mode 100644 index 0000000..44b312c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/common_flink_shell_commands.rst @@ -0,0 +1,176 @@ +:original_name: mrs_01_0598.html + +.. _mrs_01_0598: + +Common Flink Shell Commands +=========================== + +Before running the Flink shell commands, perform the following steps: + +#. Install the Flink client in a directory, for example, **/opt/client**. + +#. Run the following command to initialize environment variables: + + **source /opt/client/bigdata_env** + +#. If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled, skip this step. + + **kinit** *Service user* + +#. Run the related commands according to :ref:`Table 1 `. + + .. _mrs_01_0598__en-us_topic_0000001219149715_table65101640171215: + + .. table:: **Table 1** Flink Shell commands + + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Command | Description | Description | + +==============================================================+===========================================================================================================================================================================================================================================================+=====================================================================================================================================================================================================================================================================================+ + | yarn-session.sh | **-at,--applicationType **: Defines the Yarn application type. | Start a resident Flink cluster to receive tasks from the Flink client. | + | | | | + | | **-D **: Configures dynamic parameter. | | + | | | | + | | **-d,--detached**: Disables the interactive mode and starts a separate Flink Yarn session. | | + | | | | + | | **-h,--help**: Displays the help information about the Yarn session CLI. | | + | | | | + | | **-id,--applicationId **: Binds to a running Yarn session. | | + | | | | + | | **-j,--jar **: Sets the path of the user's JAR file. | | + | | | | + | | **-jm,--jobManagerMemory **: Sets the JobManager memory. | | + | | | | + | | **-m,--jobmanager **: Address of the JobManager (master) to which to connect. Use this parameter to connect to a specified JobManager. | | + | | | | + | | **-nl,--nodeLabel **: Specifies the nodeLabel of the Yarn application. | | + | | | | + | | **-nm,--name **: Customizes a name for the application on Yarn. | | + | | | | + | | **-q,--query**: Queries available Yarn resources. | | + | | | | + | | **-qu,--queue **: Specifies a Yarn queue. | | + | | | | + | | **-s,--slots **: Sets the number of slots for each TaskManager. | | + | | | | + | | **-t,--ship **: specifies the directory of the file to be sent. | | + | | | | + | | **-tm,--taskManagerMemory **: sets the TaskManager memory. | | + | | | | + | | **-yd,--yarndetached**: starts Yarn in the detached mode. | | + | | | | + | | **-z,--zookeeperNamespace **: specifies the namespace of ZooKeeper. | | + | | | | + | | **-h**: Gets help information. | | + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | flink run | **-c,--class **: Specifies a class as the entry for running programs. | Submit a Flink job. | + | | | | + | | **-C,--classpath **: Specifies **classpath**. | 1. The **-y\*** parameter is used in the **yarn-cluster** mode. | + | | | | + | | **-d,--detached**: Runs a job in the detached mode. | 2. If the parameter is not **-y\***, you need to run the **yarn-session** command to start the Flink cluster before running this command to submit a task. | + | | | | + | | **-files,--dependencyFiles **: File on which the Flink program depends. | | + | | | | + | | **-n,--allowNonRestoredState**: A state that cannot be restored can be skipped during restoration from a snapshot point in time. For example, if an operator in the program is deleted, you need to add this parameter when restoring the snapshot point. | | + | | | | + | | **-m,--jobmanager **: Specifies the JobManager. | | + | | | | + | | **-p,--parallelism **: Specifies the job DOP, which will overwrite the DOP parameter in the configuration file. | | + | | | | + | | **-q,--sysoutLogging**: Disables the function of outputting Flink logs to the console. | | + | | | | + | | **-s,--fromSavepoint **: Specifies a savepoint path for recovering jobs. | | + | | | | + | | **-z,--zookeeperNamespace **: specifies the namespace of ZooKeeper. | | + | | | | + | | **-yat,--yarnapplicationType **: Defines the Yarn application type. | | + | | | | + | | **-yD **: Dynamic parameter configuration. | | + | | | | + | | **-yd,--yarndetached**: Starts Yarn in the detached mode. | | + | | | | + | | **-yh,--yarnhelp**: Obtains the Yarn help. | | + | | | | + | | **-yid,--yarnapplicationId **: Binds a job to a Yarn session. | | + | | | | + | | **-yj,--yarnjar **: Sets the path to Flink jar file. | | + | | | | + | | **-yjm,--yarnjobManagerMemory **: Sets the JobManager memory (MB). | | + | | | | + | | **-ynm,--yarnname **: Customizes a name for the application on Yarn. | | + | | | | + | | **-yq,--yarnquery**: Queries available Yarn resources (memory and CPUs). | | + | | | | + | | **-yqu,--yarnqueue **: Specifies a Yarn queue. | | + | | | | + | | **-ys,--yarnslots**: Sets the number of slots for each TaskManager. | | + | | | | + | | **-yt,--yarnship **: Specifies the path of the file to be sent. | | + | | | | + | | **-ytm,--yarntaskManagerMemory **: Sets the TaskManager memory (MB). | | + | | | | + | | **-yz,--yarnzookeeperNamespace **: Specifies the namespace of ZooKeeper. The value must be the same as the value of **yarn-session.sh -z**. | | + | | | | + | | **-h**: Gets help information. | | + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | flink info | **-c,--class **: Specifies a class as the entry for running programs. | Display the execution plan (JSON) of the running program. | + | | | | + | | **-p,--parallelism **: Specifies the DOP for running programs. | | + | | | | + | | **-h**: Gets help information. | | + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | flink list | **-a,--all**: displays all jobs. | Query running programs in the cluster. | + | | | | + | | **-m,--jobmanager **: specifies the JobManager. | | + | | | | + | | **-r,--running:** displays only jobs in the running state. | | + | | | | + | | **-s,--scheduled**: displays only jobs in the scheduled state. | | + | | | | + | | **-z,--zookeeperNamespace **: specifies the namespace of ZooKeeper. | | + | | | | + | | **-yid,--yarnapplicationId **: binds a job to a Yarn session. | | + | | | | + | | **-h**: gets help information. | | + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | flink stop | **-d,--drain**: sends MAX_WATERMARK before the savepoint is triggered and the job is stopped. | Forcibly stop a running job (only streaming jobs are supported. **StoppableFunction** needs to be implemented on the source side in service code). | + | | | | + | | **-p,--savepointPath **: path for storing savepoints. The default value is **state.savepoints.dir**. | | + | | | | + | | **-m,--jobmanager **: specifies the JobManager. | | + | | | | + | | **-z,--zookeeperNamespace **: specifies the namespace of ZooKeeper. | | + | | | | + | | **-yid,--yarnapplicationId **: binds a job to a Yarn session. | | + | | | | + | | **-h**: gets help information. | | + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | flink cancel | **-m,--jobmanager **: specifies the JobManager. | Cancel a running job. | + | | | | + | | **-s,--withSavepoint **: triggers a savepoint when a job is canceled. The default directory is **state.savepoints.dir**. | | + | | | | + | | **-z,--zookeeperNamespace **: specifies the namespace of ZooKeeper. | | + | | | | + | | **-yid,--yarnapplicationId **: binds a job to a Yarn session. | | + | | | | + | | **-h**: gets help information. | | + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | flink savepoint | **-d,--dispose **: specifies a directory for storing the savepoint. | Trigger a savepoint. | + | | | | + | | **-m,--jobmanager **: specifies the JobManager. | | + | | | | + | | **-z,--zookeeperNamespace **: specifies the namespace of ZooKeeper. | | + | | | | + | | **-yid,--yarnapplicationId **: binds a job to a Yarn session. | | + | | | | + | | **-h**: gets help information. | | + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | **source** *Client installation directory*\ **/bigdata_env** | None | Import client environment variables. | + | | | | + | | | Restriction: If the user uses a custom script (for example, **A.sh**) and runs this command in the script, variables cannot be imported to the **A.sh** script. If variables need to be imported to the custom script **A.sh**, the user needs to use the secondary calling method. | + | | | | + | | | For example, first call the **B.sh** script in the **A.sh** script, and then run this command in the **B.sh** script. Parameters can be imported to the **A.sh** script but cannot be imported to the **B.sh** script. | + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | start-scala-shell.sh | local \| remote \| yarn: running mode | Start the scala shell. | + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | sh generate_keystore.sh | ``-`` | Run the **generate_keystore.sh** script to generate security cookie, **flink.keystore**, and **flink.truststore**. You need to enter a user-defined password that does not contain number signs (#). | + +--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/deleting_residual_information_about_flink_tasks.rst b/doc/component-operation-guide-lts/source/using_flink/deleting_residual_information_about_flink_tasks.rst new file mode 100644 index 0000000..5264269 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/deleting_residual_information_about_flink_tasks.rst @@ -0,0 +1,44 @@ +:original_name: mrs_01_24256.html + +.. _mrs_01_24256: + +Deleting Residual Information About Flink Tasks +=============================================== + +Scenario +-------- + +If a Flink task stops unexpectedly, some directories may reside in the ZooKeeper and HDFS services. To delete the residual directories, set **ClearUpEnabled** to **true**. + +Prerequisites +------------- + +A FlinkServer instance has been installed in a cluster and is running properly. + +Procedure +--------- + +#. Log in to Manager. + +#. Choose **Cluster** > **Services** > **Flink**. On the displayed page, click the **Configurations** tab and then **All Configurations**, search for parameter **ClearUpEnabled**, and set it to **true**. For details about related parameters, see :ref:`Table 1 `. + + .. _mrs_01_24256__en-us_topic_0000001219029127_table133021410550: + + .. table:: **Table 1** Parameters for deleting the residual directories + + +-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------------+ + | Parameter | Description | Default Value | Value Range | + +===============================+=================================================================================================================================================================+===============+==================+ + | ClearUpEnabled | Specifies whether to delete the residual directories. Set this parameter to **true** if residual directories need to be deleted; set it to **false** otherwise. | true | true and false | + +-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------------+ + | ClearUpPeriod | Specifies the period for deleting residual directories, in minutes. | 1440 | 1440~2147483647 | + +-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------------+ + | TrashDirectoryRetentionPeriod | Specifies the period for retaining residual directories, in minutes. | 10080 | 10080~2147483647 | + +-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------------+ + +#. Click **Save**. + + .. important:: + + - This function deletes only the residual directories in the **/flink_base** directory of the ZooKeeper service and the **/flink/recovery** directory of the HDFS service. User-defined directories will not be deleted. + - This function does not delete the **checkpoints** directory of the HDFS service. You need to manually delete it. diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/blob.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/blob.rst new file mode 100644 index 0000000..4421c2e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/blob.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_1567.html + +.. _mrs_01_1567: + +Blob +==== + +Scenarios +--------- + +The Blob server on the JobManager node is used to receive JAR files uploaded by users on the client, send JAR files to TaskManager, and transfer log files. Flink provides some items for configuring the Blob server. You can configure them in the **flink-conf.yaml** configuration file. + +Configuration Description +------------------------- + +Users can configure the port, SSL, retry times, and concurrency. + +.. table:: **Table 1** Parameters + + +----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+ + | Parameter | Description | Default Value | Mandatory | + +========================================+================================================================================================================================================================+================+===========+ + | blob.server.port | Blob server port | 32456 to 32520 | No | + +----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+ + | blob.service.ssl.enabled | Indicates whether to enable the encryption for the blob transmission channel. This parameter is valid only when the global switch **security.ssl** is enabled. | true | Yes | + +----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+ + | blob.fetch.retries | Number of times that TaskManager tries to download blob files from JobManager. | 50 | No | + +----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+ + | blob.fetch.num-concurrent | Number of concurrent tasks for downloading blob files supported by JobManager. | 50 | No | + +----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+ + | blob.fetch.backlog | Number of blob files, such as **.jar** files, to be downloaded in the queue supported by JobManager. The unit is count. | 1000 | No | + +----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+ + | library-cache-manager.cleanup.interval | Interval at which JobManager deletes the JAR files stored on the HDFS when the user cancels the Flink job. The unit is second. | 3600 | No | + +----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/configuring_parameter_paths.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/configuring_parameter_paths.rst new file mode 100644 index 0000000..554b7cb --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/configuring_parameter_paths.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_1565.html + +.. _mrs_01_1565: + +Configuring Parameter Paths +=========================== + +All parameters of Flink must be set on a client. The path of a configuration file is as follows: **Client installation path/Flink/flink/conf/flink-conf.yaml**. + +.. note:: + + - You are advised to modify the **flink-conf.yaml** configuration file on the client. The configuration format of the YAML file is *key*: *value*. + + Example: **taskmanager.heap.size: 1024mb** + + Note that a space is required between *key*\ **:** and *value*. + + - If parameters are modified in the Flink service configuration, you need to download and install the client again after the configuration is complete. diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/distributed_coordination_via_akka.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/distributed_coordination_via_akka.rst new file mode 100644 index 0000000..b989952 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/distributed_coordination_via_akka.rst @@ -0,0 +1,58 @@ +:original_name: mrs_01_1568.html + +.. _mrs_01_1568: + +Distributed Coordination (via Akka) +=================================== + +Scenarios +--------- + +The Akka actor model is the basis of communications between the Flink client and JobManager, JobManager and TaskManager, as well as TaskManager and TaskManager. Flink enables you to configure the Akka connection parameters in the **flink-conf.yaml** file based on the network environment or optimization policy. + +Configuration Description +------------------------- + +You can configure timeout settings of message sending and waiting, and the Akka listening mechanism Deathwatch. + +.. table:: **Table 1** Parameters + + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | Parameter | Description | Default Value | Mandatory | + +=================================================+============================================================================================================================================================================================================================================================================================+=====================================================================+===========+ + | akka.ask.timeout | Timeout duration of Akka asynchronous and block requests. If a Flink timeout failure occurs, this value can be increased. Timeout occurs when the machine processing speed is slow or the network is blocked. The unit is ms/s/m/h/d. | 10s | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.lookup.timeout | Timeout duration for JobManager actor object searching. The unit is ms/s/m/h/d. | 10s | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.framesize | Maximum size of the message transmitted between JobManager and TaskManager. If a Flink error occurs because the message exceeds this limit, the value can be increased. The unit is b/B/KB/MB. | 10485760b | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.watch.heartbeat.interval | Heartbeat interval at which the Akka DeathWatch mechanism detects disconnected TaskManager. If TaskManager is frequently and incorrectly marked as disconnected due to heartbeat loss or delay, the value can be increased. The unit is ms/s/m/h/d. | 10s | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.watch.heartbeat.pause | Acceptable heartbeat pause for Akka DeathWatch mechanism. A small value indicates that irregular heartbeat is not accepted. The unit is ms/s/m/h/d. | 60s | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.watch.threshold | DeathWatch failure detection threshold. A small value may mark normal TaskManager as failed and a large value increases failure detection time. | 12 | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.tcp.timeout | Timeout duration of Transmission Control Protocol (TCP) connection request. If TaskManager connection timeout occurs frequently due to the network congestion, the value can be increased. The unit is ms/s/m/h/d. | 20s | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.throughput | Number of messages processed by Akka in batches. After an operation, the processing thread is returned to the thread pool. A small value indicates the fair scheduling for actor message processing. A large value indicates improved overall performance but lowered scheduling fairness. | 15 | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.log.lifecycle.events | Switch of Akka remote time logging, which can be enabled for debugging. | false | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.startup-timeout | Timeout interval before a remote component fails to be started. The value must contain a time unit (ms/s/min/h/d). | The default value is the same as the value of **akka.ask.timeout**. | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.ssl.enabled | Switch of Akka communication SSL. This parameter is valid only when the global switch **security.ssl** is enabled. | true | Yes | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.client-socket-worker-pool.pool-size-factor | Factor that is used to determine the thread pool size. The pool size is calculated based on the following formula: ceil (available processors \* factor). The size is bounded by the **pool-size-min** and **pool-size-max** values. | 1.0 | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.client-socket-worker-pool.pool-size-max | Maximum number of threads calculated based on the factor. | 2 | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.client-socket-worker-pool.pool-size-min | Minimum number of threads calculated based on the factor. | 1 | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.client.timeout | Timeout duration of the client. The value must contain a time unit (ms/s/min/h/d). | 60s | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.server-socket-worker-pool.pool-size-factor | Factor that is used to determine the thread pool size. The pool size is calculated based on the following formula: ceil (available processors \* factor). The size is bounded by the **pool-size-min** and **pool-size-max** values. | 1.0 | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.server-socket-worker-pool.pool-size-max | Maximum number of threads calculated based on the factor. | 2 | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ + | akka.server-socket-worker-pool.pool-size-min | Minimum number of threads calculated based on the factor. | 1 | No | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+-----------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/environment.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/environment.rst new file mode 100644 index 0000000..6588f89 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/environment.rst @@ -0,0 +1,24 @@ +:original_name: mrs_01_1576.html + +.. _mrs_01_1576: + +Environment +=========== + +Scenario +-------- + +In scenarios raising special requirements on JVM configuration, users can use configuration items to transfer JVM parameters to the client, JobManager, and TaskManager. + +Configuration +------------- + +Configuration items include JVM parameters. + +.. table:: **Table 1** Parameter description + + +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ + | Parameter | Description | Default Value | Mandatory | + +===============+=========================================================================================================================================================+=================================================================================================================================================================================================================================================================================================================================================================================================================================================+===========+ + | env.java.opts | JVM parameter, which is transferred to the startup script, JobManager, TaskManager, and Yarn client. For example, transfer remote debugging parameters. | -Xloggc:/gc.log -XX:+PrintGCDetails -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=20M -Djdk.tls.ephemeralDHKeySize=2048 -Djava.library.path=${HADOOP_COMMON_HOME}/lib/native -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv6Addresses=false -Dbeetle.application.home.path=\ *$BIGDATA_HOME*/common/runtime/security/config | No | + +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/file_systems.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/file_systems.rst new file mode 100644 index 0000000..4a77a58 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/file_systems.rst @@ -0,0 +1,29 @@ +:original_name: mrs_01_1572.html + +.. _mrs_01_1572: + +File Systems +============ + +Scenario +-------- + +Result files are created when tasks are running. Flink enables you to configure parameters for file creation. + +Configuration Description +------------------------- + +Configuration items include overwriting policy and directory creation. + +.. table:: **Table 1** Parameter description + + +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-----------------+ + | Parameter | Description | Default Value | Mandatory | + +===================================+====================================================================================================================================================================================================================================+=================+=================+ + | fs.overwrite-files | Whether to overwrite the existing file by default when the file is written. | false | No | + +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-----------------+ + | fs.output.always-create-directory | When the degree of parallelism (DOP) of file writing programs is greater than 1, a directory is created under the output file path and different result files (one for each parallel writing program) are stored in the directory. | false | No | + | | | | | + | | - If this parameter is set to **true**, a directory is created for the writing program whose DOP is 1 and a result file is stored in the directory. | | | + | | - If this parameter is set to **false**, the file of the writing program whose DOP is 1 is created directly in the output path and no directory is created. | | | + +-----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-----------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/ha.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/ha.rst new file mode 100644 index 0000000..75fa698 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/ha.rst @@ -0,0 +1,61 @@ +:original_name: mrs_01_1575.html + +.. _mrs_01_1575: + +HA +== + +Scenarios +--------- + +The Flink HA mode depends on ZooKeeper. Therefore, ZooKeeper-related configuration items must be set. + +Configuration Description +------------------------- + +Configuration items include the ZooKeeper address, path, and security certificate. + +.. _mrs_01_1575__en-us_topic_0000001219230485_ta903d6a9c6d24f72abdf46625096cd8c: + +.. table:: **Table 1** Parameters + + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | Parameter | Description | Default Value | Mandatory | + +=======================================================+==============================================================================================================================================================+========================================================================================+=================+ + | high-availability | Whether HA is enabled. Only the following two modes are supported currently: | zookeeper | No | + | | | | | + | | #. none: Only a single JobManager is running. The checkpoint is disabled for JobManager. | | | + | | #. ZooKeeper: | | | + | | | | | + | | - In non-Yarn mode, multiple JobManagers are supported and the leader JobManager is elected. | | | + | | - In Yarn mode, only one JobManager exists. | | | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | high-availability.zookeeper.quorum | ZooKeeper quorum address. | Automatic configuration | No | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | high-availability.zookeeper.path.root | Root directory that Flink creates on ZooKeeper, storing metadata required in HA mode. | /flink | No | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | high-availability.storageDir | Directory for storing JobManager metadata of state backend. ZooKeeper stores only pointers to actual data. | hdfs:///flink/recovery | No | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | high-availability.zookeeper.client.session-timeout | Session timeout duration on the ZooKeeper client. The unit is millisecond. | 60000 | No | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | high-availability.zookeeper.client.connection-timeout | Connection timeout duration on the ZooKeeper client. The unit is millisecond. | 15000 | No | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | high-availability.zookeeper.client.retry-wait | Retry waiting time on the ZooKeeper client. The unit is millisecond. | 5000 | No | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | high-availability.zookeeper.client.max-retry-attempts | Maximum retry times on the ZooKeeper client. | 3 | No | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | high-availability.job.delay | Delay of job restart when JobManager recovers. | The default value is the same as the value of **akka.ask.timeout**. | No | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | high-availability.zookeeper.client.acl | ACL (open creator) of the ZooKeeper node. For ACL options, see https://zookeeper.apache.org/doc/r3.5.1-alpha/zookeeperProgrammers.html#sc_BuiltinACLSchemes. | This parameter is configured automatically according to the cluster installation mode. | Yes | + | | | | | + | | | - Security mode: The default value is **creator**. | | + | | | - Non-security mode: The default value is **open**. | | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | zookeeper.sasl.disable | Simple authentication and security layer (SASL)-based certificate enable switch. | This parameter is configured automatically according to the cluster installation mode. | Yes | + | | | | | + | | | - Security mode: The default value is **false**. | | + | | | - Non-security mode: The default value is **true**. | | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | zookeeper.sasl.service-name | - If the ZooKeeper server configures a service whose name is different from **ZooKeeper**, this configuration item can be set. | zookeeper | Yes | + | | - If service names on the client and server are inconsistent, authentication fails. | | | + +-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/index.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/index.rst new file mode 100644 index 0000000..8fc4b55 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/index.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_0592.html + +.. _mrs_01_0592: + +Flink Configuration Management +============================== + +- :ref:`Configuring Parameter Paths ` +- :ref:`JobManager & TaskManager ` +- :ref:`Blob ` +- :ref:`Distributed Coordination (via Akka) ` +- :ref:`SSL ` +- :ref:`Network communication (via Netty) ` +- :ref:`JobManager Web Frontend ` +- :ref:`File Systems ` +- :ref:`State Backend ` +- :ref:`Kerberos-based Security ` +- :ref:`HA ` +- :ref:`Environment ` +- :ref:`Yarn ` +- :ref:`Pipeline ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + configuring_parameter_paths + jobmanager_&_taskmanager + blob + distributed_coordination_via_akka + ssl + network_communication_via_netty + jobmanager_web_frontend + file_systems + state_backend + kerberos-based_security + ha + environment + yarn + pipeline diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/jobmanager_&_taskmanager.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/jobmanager_&_taskmanager.rst new file mode 100644 index 0000000..42e5051 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/jobmanager_&_taskmanager.rst @@ -0,0 +1,94 @@ +:original_name: mrs_01_1566.html + +.. _mrs_01_1566: + +JobManager & TaskManager +======================== + +Scenarios +--------- + +JobManager and TaskManager are main components of Flink. You can configure the parameters for different security and performance scenarios on the client. + +Configuration Description +------------------------- + +Main configuration items include communication port, memory management, connection retry, and so on. + +.. table:: **Table 1** Parameters + + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | Parameter | Description | Default Value | Mandatory | + +======================================================+=============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+===============================================================================================================================================+=================+ + | taskmanager.rpc.port | IPC port range of TaskManager | 32326-32390 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | client.rpc.port | Akka system listening port on the Flink client. | 32651-32720 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.data.port | Data exchange port range of TaskManager | 32391-32455 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.data.ssl.enabled | Whether to enable secure sockets layer (SSL) encryption for data transfer between TaskManagers. This parameter is valid only when the global switch **security.ssl** is enabled. | false | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | jobmanager.heap.size | Size of the heap memory of JobManager. In **yarn-session** mode, the value can be transmitted by only the **-jm** parameter. In **yarn-cluster** mode, the value can be transmitted by only the **-yjm** parameter. If the value is smaller than **yarn.scheduler.minimum-allocation-mb** in the Yarn configuration file, the Yarn configuration value is used. Unit: B/KB/MB/GB/TB. | 1024mb | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.heap.size | Size of the heap memory of TaskManager. In **yarn-session** mode, the value can be transmitted by only the **-tm** parameter. In **yarn-cluster** mode, the value can be transmitted by only the **-ytm** parameter. If the value is smaller than **yarn.scheduler.minimum-allocation-mb** in the Yarn configuration file, the Yarn configuration value is used. The unit is B/KB/MB/GB/TB. | 1024mb | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.numberOfTaskSlots | Number of slots occupied by TaskManager. Generally, the value is configured as the number of cores of the Node. In **yarn-session** mode, the value can be transmitted by only the **-s** parameter. In **yarn-cluster** mode, the value can be transmitted by only the **-ys** parameter. | 1 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | parallelism.default | Default degree of parallelism, which is used for jobs for which the degree of parallelism is not specified | 1 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.network.numberOfBuffers | Number of TaskManager network transmission buffer stacks. If an error indicates insufficient system buffer, increase the parameter value. | 2048 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.memory.fraction | Ratio of JVM heap memory that TaskManager reserves for sorting, hash tables, and caching of intermediate results. | 0.7 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.memory.off-heap | Whether TaskManager uses off-heap memory for sorting, hash tables and intermediate status. You are advised to enable this item for large memory needs to improve memory operation efficiency. | false | Yes | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.memory.segment-size | Size of the memory buffer used by the memory manager and network stack The unit is bytes. | 32768 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.memory.preallocate | Whether TaskManager allocates reserved memory space upon startup. You are advised to enable this item when off-heap memory is used. | false | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.debug.memory.startLogThread | Enable this item for debugging Flink memory and garbage collection (GC)-related problems. TaskManager periodically collects memory and GC statistics, including the current utilization of heap and off-heap memory pools and GC time. | false | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.debug.memory.logIntervalMs | Interval at which TaskManager periodically collects memory and GC statistics. | 0 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.maxRegistrationDuration | Maximum duration of TaskManager registration on JobManager. If the actual duration exceeds the value, TaskManager is disabled. | 5 min | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.initial-registration-pause | Initial interval between two consecutive registration attempts. The value must contain a time unit (ms/s/min/h/d), for example, 5 seconds. | 500ms | No | + | | | | | + | | | .. note:: | | + | | | | | + | | | The time value and unit are separated by half-width spaces. ms/s/m/h/d indicates millisecond, second, minute, hour, and day, respectively. | | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.max-registration-pause | Maximum registration retry interval in case of TaskManager registration failures. The unit is ms/s/m/h/d. | 30s | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.refused-registration-pause | Retry interval when a TaskManager registration connection is rejected by JobManager. The unit is ms/s/m/h/d. | 10s | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | task.cancellation.interval | Interval between two successive task cancellation attempts. The unit is millisecond. | 30000 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | classloader.resolve-order | Class resolution policies defined when classes are loaded from user codes, which means whether to first check the user code JAR file (**child-first**) or the application class path (**parent-first**). The default setting indicates that the class is first loaded from the user code JAR file, which means that the user code JAR file can contain and load dependencies that are different from those used by Flink. | child-first | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | slot.idle.timeout | Timeout for an idle slot in Slot Pool, in milliseconds. | 50000 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | slot.request.timeout | Timeout for requesting a slot from Slot Pool, in milliseconds. | 300000 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | task.cancellation.timeout | Timeout of task cancellation, in milliseconds. If a task cancellation times out, a fatal TaskManager error may occur. If this parameter is set to **0**, no error is reported when a task cancellation times out. | 180000 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.network.detailed-metrics | Indicates whether to enable the detailed metrics monitoring of network queue lengths. | false | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.network.memory.buffers-per-channel | Maximum number of network buffers used by each output/input channel (sub-partition/incoming channel). In credit-based flow control mode, this indicates how much credit is in each input channel. It should be configured with at least 2 buffers to deliver good performance. One buffer is used to receive in-flight data in the sub-partition, and the other for parallel serialization. | 2 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.network.memory.floating-buffers-per-gate | Number of extra network buffers used by each output gate (result partition) or input gate, indicating the amount of floating credit shared among all input channels in credit-based flow control mode. Floating buffers are distributed based on the backlog feedback (real-time output buffers in sub-partitions) and can help mitigate back pressure caused by unbalanced data distribution among sub-partitions. Increase this value if the round-trip time between nodes is long and/or the number of machines in the cluster is large. | 8 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.network.memory.fraction | Ratio of JVM memory used for network buffers, which determines how many streaming data exchange channels a TaskManager can have at the same time and the extent of channel buffering. Increase this value or the values of **taskmanager.network.memory.min** and **taskmanager.network.memory.max** if the job is rejected or a warning indicating that the system does not have enough buffers is received. Note that the values of **taskmanager.network.memory.min** and **taskmanager.network.memory.max** may overwrite this value. | 0.1 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.network.memory.max | Maximum memory size of the network buffer. The value must contain a unit (B/KB/MB/GB/TB). | 1 GB | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.network.memory.min | Minimum memory size of the network buffer. The value must contain a unit (B/KB/MB/GB/TB). | 64 MB | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.network.request-backoff.initial | Minimum backoff for partition requests of input channels. | 100 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.network.request-backoff.max | Maximum backoff for partition requests of input channels. | 10000 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | taskmanager.registration.timeout | Timeout for TaskManager registration. TaskManager will be terminated if it is not successfully registered within the specified time. The value must contain a time unit (ms/s/min/h/d). | 5 min | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | resourcemanager.taskmanager-timeout | Timeout interval for releasing an idle TaskManager, in milliseconds. | 30000 | No | + +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/jobmanager_web_frontend.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/jobmanager_web_frontend.rst new file mode 100644 index 0000000..8ce1636 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/jobmanager_web_frontend.rst @@ -0,0 +1,92 @@ +:original_name: mrs_01_1571.html + +.. _mrs_01_1571: + +JobManager Web Frontend +======================= + +Scenarios +--------- + +When JobManager is started, the web server in the same process is also started. + +- You can access the web server to obtain information about the current Flink cluster, including information about JobManager, TaskManager, and running jobs in the cluster. +- You can configure parameters of the web server. + +Configuration Description +------------------------- + +Configuration items include the port, temporary directory, display items, error redirection, and security-related items. + +.. table:: **Table 1** Parameters + + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | Parameter | Description | Default Value | Mandatory | + +===================================================+==========================================================================================================================================+===============================================================================+=================+ + | flink.security.enable | When installing a Flink cluster, you are required to select **security mode** or **normal mode**. | The value is automatically configured based on the cluster installation mode. | No | + | | | | | + | | - If **security mode** is selected, the value of **flink.security.enable** is automatically set to **true**. | | | + | | - If **normal mode** is selected, the value of **flink.security.enable** is automatically set to **false**. | | | + | | | | | + | | If you want to checker whether Flink cluster is in security mode or normal mode, view the value of **flink.security.enable**. | | | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | rest.bind-port | Web port. Value range: 32261-32325. | 32261-32325 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.history | Number of recent jobs to be displayed. | 5 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.checkpoints.disable | Indicates whether to disable checkpoint statistics. | false | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.checkpoints.history | Number of checkpoint statistical records. | 10 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.backpressure.cleanup-interval | Interval for clearing unaccessed backpressure records. The unit is millisecond. | 600000 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.backpressure.refresh-interval | Interval for updating backpressure records. The unit is millisecond. | 60000 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.backpressure.num-samples | Number of stack tracing records for reverse pressure calculation. | 100 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.backpressure.delay-between-samples | Sampling interval for reverse pressure calculation. The unit is millisecond. | 50 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.ssl.enabled | Whether SSL encryption is enabled for web transmission. This parameter is valid only when the global switch **security.ssl** is enabled. | false | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.accesslog.enable | Switch to enable or disable web operation logs. The log is stored in **webaccess.log**. | true | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.x-frame-options | Value of the HTTP security header **X-Frame-Options**. The value can be **SAMEORIGIN**, **DENY**, or **ALLOW-FROM uri**. | DENY | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.cache-directive | Whether the web page can be cached. | no-store | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.expires-time | Expiration duration of web page cache. The unit is millisecond. | 0 | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.allow-access-address | Web access whitelist. IP addresses are separated by commas (,). Only IP addresses in the whitelist can access the web. | \* | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.access-control-allow-origin | Web page same-origin policy that prevents cross-domain attacks. | \* | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.refresh-interval | Web page refresh interval. The unit is millisecond. | 3000 | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.logout-timer | Automatic logout interval when no operation is performed. The unit is millisecond. | 600000 | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.403-redirect-url | Web page access error 403. If 403 error occurs, the page switch to a specified page. | Automatic configuration | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.404-redirect-url | Web page access error 404. If 404 error occurs, the page switch to a specified page. | Automatic configuration | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.415-redirect-url | Web page access error 415. If 415 error occurs, the page switch to a specified page. | Automatic configuration | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | jobmanager.web.500-redirect-url | Web page access error 500. If 500 error occurs, the page switch to a specified page. | Automatic configuration | Yes | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | rest.await-leader-timeout | Time of the client waiting for the leader address. The unit is millisecond. | 30000 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | rest.client.max-content-length | Maximum content length that the client handles (unit: bytes). | 104857600 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | rest.connection-timeout | Maximum time for the client to establish a TCP connection (unit: ms). | 15000 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | rest.idleness-timeout | Maximum time for a connection to stay idle before failing (unit: ms). | 300000 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | rest.retry.delay | The time that the client waits between retries (unit: ms). | 3000 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | rest.retry.max-attempts | The number of retry times if a retrievable operator fails. | 20 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | rest.server.max-content-length | Maximum content length that the server handles (unit: bytes). | 104857600 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | rest.server.numThreads | Maximum number of threads for the asynchronous processing of requests. | 4 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ + | web.timeout | Timeout for web monitor (unit: ms). | 10000 | No | + +---------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+-----------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/kerberos-based_security.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/kerberos-based_security.rst new file mode 100644 index 0000000..fa03833 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/kerberos-based_security.rst @@ -0,0 +1,35 @@ +:original_name: mrs_01_1574.html + +.. _mrs_01_1574: + +Kerberos-based Security +======================= + +Scenarios +--------- + +Flink Kerberos configuration items must be configured in security mode. + +Configuration Description +------------------------- + +The configuration items include **keytab**, **principal**, and **cookie** of Kerberos. + +.. table:: **Table 1** Parameters + + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | Parameter | Description | Default Value | Mandatory | + +===================================+=================================================================================================================================================================+========================================================================================+=================+ + | security.kerberos.login.keytab | Keytab file path. This parameter is a client parameter. | Configure the parameter based on actual service requirements. | Yes | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | security.kerberos.login.principal | A parameter on the client. If **security.kerberos.login.keytab** and **security.kerberos.login.principal** are both set, keytab certificate is used by default. | Configure the parameter based on actual service requirements. | No | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | security.kerberos.login.contexts | Contexts of the jass file generated by Flink. This parameter is a server parameter. | Client, KafkaClient | Yes | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | security.enable | Certificate enabling switch of the Flink internal module. This parameter is a client parameter. | This parameter is configured automatically according to the cluster installation mode. | Yes | + | | | | | + | | | - Security mode: The default value is **true**. | | + | | | - Non-security mode: The default value is **false**. | | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ + | security.cookie | Module certificate token. This parameter is a client parameter. It must be configured and cannot be left empty when **security.enable** is enabled. | Configure the parameter based on actual service requirements. | Yes | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+-----------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/network_communication_via_netty.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/network_communication_via_netty.rst new file mode 100644 index 0000000..d401f10 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/network_communication_via_netty.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_1570.html + +.. _mrs_01_1570: + +Network communication (via Netty) +================================= + +Scenario +-------- + +When Flink runs a job, data transmission and reverse pressure detection between tasks depend on Netty. In certain environments, **Netty** parameters should be configured. + +Configuration Description +------------------------- + +For advanced optimization, you can modify the following Netty configuration items. The default configuration can meet the requirements of tasks of large-scale clusters with high concurrent throughput. + +.. table:: **Table 1** Parameter description + + +----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+-----------+ + | Parameter | Description | Default Value | Mandatory | + +====================================================+=======================================================================================================================================================================+===============+===========+ + | taskmanager.network.netty.num-arenas | Number of Netty memory blocks. | 1 | No | + +----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+-----------+ + | taskmanager.network.netty.server.numThreads | Number of Netty server threads | 1 | No | + +----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+-----------+ + | taskmanager.network.netty.client.numThreads | Number of Netty client threads | 1 | No | + +----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+-----------+ + | taskmanager.network.netty.client.connectTimeoutSec | Netty client connection timeout duration. Unit: second | 120 | No | + +----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+-----------+ + | taskmanager.network.netty.sendReceiveBufferSize | Size of Netty sending and receiving buffers. This defaults to the system buffer size (**cat /proc/sys/net/ipv4/tcp_[rw]mem**) and is 4 MB in modern Linux. Unit: byte | 4096 | No | + +----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+-----------+ + | taskmanager.network.netty.transport | Netty transport type, either **nio** or **epoll** | nio | No | + +----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+-----------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/pipeline.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/pipeline.rst new file mode 100644 index 0000000..b5db76d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/pipeline.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_1578.html + +.. _mrs_01_1578: + +Pipeline +======== + +Scenarios +--------- + +The Netty connection is used among multiple jobs to reduce latency. In this case, NettySink is used on the server and NettySource is used on the client for data transmission. + +Configuration Description +------------------------- + +Configuration items include NettySink information storing path, range of NettySink listening port, whether to enable SSL encryption, domain of the network used for NettySink monitoring. + +.. table:: **Table 1** Parameters + + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------+----------------------------------------------------------------+ + | Parameter | Description | Default Value | Mandatory | + +=============================================+=============================================================================================================================================================================+===========================================================+================================================================+ + | nettyconnector.registerserver.topic.storage | Path (on a third-party server) to information about IP address, port numbers, and concurrency of NettySink. ZooKeeper is recommended for storage. | /flink/nettyconnector | No. However, if pipeline is enabled, the feature is mandatory. | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------+----------------------------------------------------------------+ + | nettyconnector.sinkserver.port.range | Port range of NettySink. | If MRS cluster is used, the default value is 28444-28843. | No. However, if pipeline is enabled, the feature is mandatory. | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------+----------------------------------------------------------------+ + | nettyconnector.ssl.enabled | Whether SSL encryption for the communication between NettySink and NettySource is enabled. For details about the encryption key and protocol, see :ref:`SSL `. | false | No. However, if pipeline is enabled, the feature is mandatory. | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------+----------------------------------------------------------------+ + | nettyconnector.message.delimiter | Delimiter used to configure the message sent by NettySink to the NettySource, which is 2-4 bytes long, and cannot contain **\\n**, **#**, or space. | The default value is **$\_**. | No. However, if pipeline is enabled, the feature is mandatory. | + +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------+----------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/ssl.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/ssl.rst new file mode 100644 index 0000000..592fa76 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/ssl.rst @@ -0,0 +1,43 @@ +:original_name: mrs_01_1569.html + +.. _mrs_01_1569: + +SSL +=== + +Scenarios +--------- + +When the secure Flink cluster is required, SSL-related configuration items must be set. + +Configuration Description +------------------------- + +Configuration items include the SSL switch, certificate, password, and encryption algorithm. + +.. table:: **Table 1** Parameters + + +----------------------------------+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | Parameter | Description | Default Value | Mandatory | + +==================================+===============================================================================+=======================================================================================================================================================+=================+ + | security.ssl.enabled | Main switch of internal communication SSL. | The value is automatically configured according to the cluster installation mode. | Yes | + | | | | | + | | | - Security mode: The default value is **true**. | | + | | | - Non-security mode: The default value is **false**. | | + +----------------------------------+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | security.ssl.keystore | Java keystore file. | ``-`` | Yes | + +----------------------------------+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | security.ssl.keystore-password | Password used to decrypt the keystore file. | ``-`` | Yes | + +----------------------------------+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | security.ssl.key-password | Password used to decrypt the server key in the keystore file. | ``-`` | Yes | + +----------------------------------+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | security.ssl.truststore | **truststore** file containing the public CA certificates. | ``-`` | Yes | + +----------------------------------+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | security.ssl.truststore-password | Password used to decrypt the truststore file. | ``-`` | Yes | + +----------------------------------+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | security.ssl.protocol | SSL transmission protocol version. | TLSv1.2 | Yes | + +----------------------------------+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ + | security.ssl.algorithms | Supported SSL standard algorithm. For details, see the Java official website. | The default value: | Yes | + | | | | | + | | | "TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384" | | + +----------------------------------+-------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/state_backend.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/state_backend.rst new file mode 100644 index 0000000..d2c1893 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/state_backend.rst @@ -0,0 +1,44 @@ +:original_name: mrs_01_1573.html + +.. _mrs_01_1573: + +State Backend +============= + +Scenarios +--------- + +Flink enables HA and job exception, as well as job pause and recovery during version upgrade. Flink depends on state backend to store job states and on the restart strategy to restart a job. You can configure state backend and the restart strategy. + +Configuration Description +------------------------- + +Configuration items include the state backend type, storage path, and restart strategy. + +.. table:: **Table 1** Parameters + + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Parameter | Description | Default Value | Mandatory | + +=========================================================+=====================================================================================================================================================================+================================================================================================================================================+============================+ + | state.backend.fs.checkpointdir | Path when the backend is set to **filesystem**. The path must be accessible by JobManager. Only the local mode is supported. In the cluster mode, use an HDFS path. | hdfs:///flink/checkpoints | No | + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | state.savepoints.dir | Savepoint storage directory used by Flink to restore and update jobs. When a savepoint is triggered, the metadata of the savepoint is saved to this directory. | hdfs:///flink/savepoint | Mandatory in security mode | + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | restart-strategy | Default restart policy, which is used for jobs for which no restart policy is specified. The options are as follows: | none | No | + | | | | | + | | - fixed-delay | | | + | | - failure-rate | | | + | | - none | | | + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | restart-strategy.fixed-delay.attempts | Number of retry times when the fixed-delay restart strategy is used. | - If the checkpoint is enabled, the default value is the value of **Integer.MAX_VALUE**. | No | + | | | - If the checkpoint is disabled, the default value is 3. | | + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | restart-strategy.fixed-delay.delay | Retry interval when the fixed-delay strategy is used. The unit is ms/s/m/h/d. | - If the checkpoint is enabled, the default value is 10s. | No | + | | | - If the checkpoint is disabled, the default value is the value of **akka.ask.timeout**. | | + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | restart-strategy.failure-rate.max-failures-per-interval | Maximum number of restart times in a specified period before a job fails when the fault rate policy is used. | 1 | No | + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | restart-strategy.failure-rate.failure-rate-interval | Retry interval when the failure-rate strategy is used. The unit is ms/s/m/h/d. | 60 s | No | + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | restart-strategy.failure-rate.delay | Retry interval when the failure-rate strategy is used. The unit is ms/s/m/h/d. | The default value is the same as the value of **akka.ask.timeout**. For details, see :ref:`Distributed Coordination (via Akka) `. | No | + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/yarn.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/yarn.rst new file mode 100644 index 0000000..aff02f8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/yarn.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_1577.html + +.. _mrs_01_1577: + +Yarn +==== + +Scenario +-------- + +Flink runs on a Yarn cluster and JobManager runs on ApplicationMaster. Certain configuration parameters of JobManager depend on Yarn. By setting Yarn-related configuration items, Flink is enabled to run better on Yarn. + +Configuration Description +------------------------- + +The configuration items include the memory, virtual kernel, and port of the Yarn container. + +.. table:: **Table 1** Parameter description + + +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+-----------+ + | Parameter | Description | Default Value | Mandatory | + +================================+===============================================================================================================================================================================================================================================================================+=======================================================+===========+ + | yarn.maximum-failed-containers | Maximum number of containers the system is going to reallocate in case of a container failure of TaskManager The default value is the number of TaskManagers when the Flink cluster is started. | 5 | No | + +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+-----------+ + | yarn.application-attempts | Number of ApplicationMaster restarts. The value is the maximum value in the validity interval that is set to Akka's timeout in Flink. After the restart, the IP address and port number of ApplicationMaster will change and you will need to connect to the client manually. | 2 | No | + +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+-----------+ + | yarn.heartbeat-delay | Time between heartbeats with the ApplicationMaster and Yarn ResourceManager in seconds. Unit: second | 5 | No | + +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+-----------+ + | yarn.containers.vcores | Number of virtual cores of each Yarn container | The default value is the number of TaskManager slots. | No | + +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+-----------+ + | yarn.application-master.port | ApplicationMaster port number setting. A port number range is supported. | 32586-32650 | No | + +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+-----------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_log_overview.rst b/doc/component-operation-guide-lts/source/using_flink/flink_log_overview.rst new file mode 100644 index 0000000..afa1e1d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_log_overview.rst @@ -0,0 +1,122 @@ +:original_name: mrs_01_0596.html + +.. _mrs_01_0596: + +Flink Log Overview +================== + +Log Description +--------------- + +**Log path:** + +- Run logs of a Flink job: **${BIGDATA_DATA_HOME}/hadoop/data${i}/nm/containerlogs/application_${appid}/container_{$contid}** + + .. note:: + + The logs of executing tasks are stored in the preceding path. After the execution is complete, the Yarn configuration determines whether these logs are gathered to the HDFS directory. + +- FlinkResource run logs: **/var/log/Bigdata/flink/flinkResource** +- FlinkServer run logs: **/var/log/Bigdata/flink/flinkserver** +- FlinkServer audit logs: **/var/log/Bigdata/audit/flink/flinkserver** + +**Log archive rules:** + +#. FlinkResource run logs: + + - By default, service logs are backed up each time when the log size reaches 20 MB. A maximum of 20 logs can be reserved without being compressed. + - You can set the log size and number of compressed logs on the Manager page or modify the corresponding configuration items in **log4j-cli.properties**, **log4j.properties**, and **log4j-session.properties** in **/opt/client/Flink/flink/conf/** on the client. **/opt/client** is the client installation directory. + + .. table:: **Table 1** FlinkResource log list + + ====================== ================ ======================== + Type Name Description + ====================== ================ ======================== + FlinkResource run logs checkService.log Health check log + \ kinit.log Initialization log + \ postinstall.log Service installation log + \ prestart.log Prestart script log + \ start.log Startup log + ====================== ================ ======================== + +#. FlinkServer service logs and audit logs + + - By default, FlinkServer service logs and audit logs are backed up each time when the log size reaches 100 MB. The service logs are stored for a maximum of 30 days, and audit logs are stored for a maximum of 90 days. + - You can set the log size and number of compressed logs on the Manager page or modify the corresponding configuration items in **log4j-cli.properties**, **log4j.properties**, and **log4j-session.properties** in **/opt/client/Flink/flink/conf/** on the client. **/opt/client** is the client installation directory. + + .. table:: **Table 2** FlinkServer log list + + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | Type | Name | Description | + +========================+==============================================+===============================================================+ + | FlinkServer run logs | checkService.log | Health check log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | cleanup.log | Cleanup log file for instance installation and uninstallation | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | flink-omm-client-*IP*.log | Job startup log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | flinkserver\_\ *yyyymmdd-x*.log.gz | Service archive log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | flinkserver.log | Service log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | flinkserver---*pidxxxx*-gc.log.\ *x*.current | GC log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | kinit.log | Initialization log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | postinstall.log | Service installation log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | prestart.log | Prestart script log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | start.log | Startup log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | stop.log | Stop log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | FlinkServer audit logs | flinkserver_audit\_\ *yyyymmdd-x*.log.gz | Audit archive log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + | | flinkserver_audit.log | Audit log | + +------------------------+----------------------------------------------+---------------------------------------------------------------+ + +Log Level +--------- + +:ref:`Table 3 ` describes the log levels supported by Flink. The priorities of log levels are ERROR, WARN, INFO, and DEBUG in descending order. Logs whose levels are higher than or equal to the specified level are printed. The number of printed logs decreases as the specified log level increases. + +.. _mrs_01_0596__en-us_topic_0000001219029139_table63318572917: + +.. table:: **Table 3** Log levels + + ===== ============================================================= + Level Description + ===== ============================================================= + ERROR Error information about the current event processing + WARN Exception information about the current event processing + INFO Normal running status information about the system and events + DEBUG System information and system debugging information + ===== ============================================================= + +To modify log levels, perform the following steps: + +#. Go to the **All Configurations** page of Flink by referring to :ref:`Modifying Cluster Service Configuration Parameters `. +#. On the menu bar on the left, select the log menu of the target role. +#. Select a desired log level. +#. Save the configuration. In the displayed dialog box, click **OK** to make the configurations take effect. + +.. note:: + + - After the configuration is complete, you do not need to restart the service. Download the client again for the configuration to take effect. + - You can also change the configuration items corresponding to the log level in **log4j-cli.properties**, **log4j.properties**, and **log4j-session.properties** in **/opt/client/Flink/flink/conf/** on the client. **/opt/client** is the client installation directory. + - When a job is submitted using a client, a log file is generated in the **log** folder on the client. The default umask value is **0022**. Therefore, the default log permission is **644**. To change the file permission, you need to change the umask value. For example, to change the umask value of user **omm**: + + - Add **umask 0026** to the end of the **/home/omm/.baskrc** file. + - Run the **source /home/omm/.baskrc** command to make the file permission take effect. + +Log Format +---------- + +.. table:: **Table 4** Log formats + + +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Format | Example | + +=========+========================================================================================================================================================+=====================================================================================================================================================================================================================================+ + | Run log | <*yyyy-MM-dd HH:mm:ss,SSS*>|<*Log level*>|<*Name of the thread that generates the log*>|<*Message in the log*>|<*Location where the log event occurs*> | 2019-06-27 21:30:31,778 \| INFO \| [flink-akka.actor.default-dispatcher-3] \| TaskManager container_e10_1498290698388_0004_02_000007 has started. \| org.apache.flink.yarn.YarnFlinkResourceManager (FlinkResourceManager.java:368) | + +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/index.rst b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/index.rst new file mode 100644 index 0000000..bba6b0e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/index.rst @@ -0,0 +1,14 @@ +:original_name: mrs_01_0597.html + +.. _mrs_01_0597: + +Flink Performance Tuning +======================== + +- :ref:`Optimization DataStream ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + optimization_datastream/index diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_dop.rst b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_dop.rst new file mode 100644 index 0000000..dc3e364 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_dop.rst @@ -0,0 +1,64 @@ +:original_name: mrs_01_1589.html + +.. _mrs_01_1589: + +Configuring DOP +=============== + +Scenario +-------- + +The degree of parallelism (DOP) indicates the number of tasks to be executed concurrently. It determines the number of data blocks after the operation. Configuring the DOP will optimize the number of tasks, data volume of each task, and the host processing capability. + +Query the CPU and memory usage. If data and tasks are not evenly distributed among nodes, increase the DOP for even distribution. + +Procedure +--------- + +Configure the DOP at one of the following layers (the priorities of which are in the descending order) based on the actual memory, CPU, data, and application logic conditions: + +- Operator + + Call the **setParallelism()** method to specify the DOP of an operator, data source, and sink. For example: + + .. code-block:: + + final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + + DataStream text = [...] + DataStream> wordCounts = text + .flatMap(new LineSplitter()) + .keyBy(0) + .timeWindow(Time.seconds(5)) + .sum(1).setParallelism(5); + + wordCounts.print(); + + env.execute("Word Count Example"); + +- Execution environment + + Flink runs in the execution environment which defines a default DOP for operators, data source and data sink. + + Call the **setParallelism()** method to specify the default DOP of the execution environment. Example: + + .. code-block:: + + final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + env.setParallelism(3); + DataStream text = [...] + DataStream> wordCounts = [...] + wordCounts.print(); + env.execute("Word Count Example"); + +- Client + + Specify the DOP when submitting jobs to Flink on the client. If you use the CLI client, specify the DOP using the **-p** parameter. Example: + + .. code-block:: + + ./bin/flink run -p 10 ../examples/*WordCount-java*.jar + +- System + + On the Flink client, modify the **parallelism.default** parameter in the **flink-conf.yaml** file under the conf to specify the DOP for all execution environments. diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_process_parameters.rst b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_process_parameters.rst new file mode 100644 index 0000000..5e37b84 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_process_parameters.rst @@ -0,0 +1,43 @@ +:original_name: mrs_01_1590.html + +.. _mrs_01_1590: + +Configuring Process Parameters +============================== + +Scenario +-------- + +In Flink on Yarn mode, there are JobManagers and TaskManagers. JobManagers and TaskManagers schedule and run tasks. + +Therefore, configuring parameters of JobManagers and TaskManagers can optimize the execution performance of a Flink application. Perform the following steps to optimize the Flink cluster performance. + +Procedure +--------- + +#. Configure JobManager memory. + + JobManagers are responsible for task scheduling and message communications between TaskManagers and ResourceManagers. JobManager memory needs to be increased as the number of tasks and the DOP increases. + + JobManager memory needs to be configured based on the number of tasks. + + - When running the **yarn-session** command, add the **-jm MEM** parameter to configure the memory. + - When running the **yarn-cluster** command, add the **-yjm MEM** parameter to configure the memory. + +#. Configure the number of TaskManagers. + + Each core of a TaskManager can run a task at the same time. Increasing the number of TaskManagers has the same effect as increasing the DOP. Therefore, you can increase the number of TaskManagers to improve efficiency when there are sufficient resources. + +#. Configure the number of TaskManager slots. + + Multiple cores of a TaskManager can process multiple tasks at the same time. This has the same effect as increasing the DOP. However, the balance between the number of cores and the memory must be maintained, because all cores of a TaskManager share the memory. + + - When running the **yarn-session** command, add the **-s NUM** parameter to configure the number of slots. + - When running the **yarn-cluster** command, add the **-ys NUM** parameter to configure the number of slots. + +#. Configure TaskManager memory. + + TaskManager memory is used for task execution and communication. A large-size task requires more resources. In this case, you can increase the memory. + + - When running the **yarn-session** command, add the **-tm MEM** parameter to configure the memory. + - When running the **yarn-cluster** command, add the **-ytm MEM** parameter to configure the memory. diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_the_netty_network_communication.rst b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_the_netty_network_communication.rst new file mode 100644 index 0000000..fcc517e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_the_netty_network_communication.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_1592.html + +.. _mrs_01_1592: + +Configuring the Netty Network Communication +=========================================== + +Scenarios +--------- + +The communication of Flink is based on Netty network. The network performance determines the data switching speed and task execution efficiency. Therefore, the performance of Flink can be optimized by optimizing the Netty network. + +Procedure +--------- + +In the **conf/flink-conf.yaml** file on the client, change configurations as required. Exercise caution when changing default values, because default values are optimal. + +- **taskmanager.network.netty.num-arenas**: Specifies the number of arenas of Netty. The default value is **taskmanager.numberOfTaskSlots**. +- **taskmanager.network.netty.server.numThreads** and **taskmanager.network.netty.client.numThreads**: specify the number of threads on the client and server. The default value is **taskmanager.numberOfTaskSlots**. +- **taskmanager.network.netty.client.connectTimeoutSec**: specifies the timeout interval for connection of TaskManager client. The default value is **120s**. +- **taskmanager.network.netty.sendReceiveBufferSize**: specifies the buffer size of the Netty network. The default value is the buffer size (cat /proc/sys/net/ipv4/tcp_[rw]mem) of the system and the value is usually 4 MB. +- **taskmanager.network.netty.transport**: specifies the transmission method of the Netty network. The default value is **nio**. The value can only be **nio** and **epoll**. diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/index.rst b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/index.rst new file mode 100644 index 0000000..03ff67a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/index.rst @@ -0,0 +1,24 @@ +:original_name: mrs_01_1587.html + +.. _mrs_01_1587: + +Optimization DataStream +======================= + +- :ref:`Memory Configuration Optimization ` +- :ref:`Configuring DOP ` +- :ref:`Configuring Process Parameters ` +- :ref:`Optimizing the Design of Partitioning Method ` +- :ref:`Configuring the Netty Network Communication ` +- :ref:`Summarization ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + memory_configuration_optimization + configuring_dop + configuring_process_parameters + optimizing_the_design_of_partitioning_method + configuring_the_netty_network_communication + summarization diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/memory_configuration_optimization.rst b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/memory_configuration_optimization.rst new file mode 100644 index 0000000..d396f51 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/memory_configuration_optimization.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_1588.html + +.. _mrs_01_1588: + +Memory Configuration Optimization +================================= + +Scenarios +--------- + +The computing of Flink depends on memory. If the memory is insufficient, the performance of Flink will be greatly deteriorated. One solution is to monitor garbage collection (GC) to evaluate the memory usage. If the memory becomes the performance bottleneck, optimize the memory usage according to the actual situation. + +If **Full GC** is frequently reported in the Container GC on the Yarn that monitors the node processes, the GC needs to be optimized. + +.. note:: + + In the **env.java.opts** configuration item of the **conf/flink-conf.yaml** file on the client, add the **-Xloggc:/gc.log -XX:+PrintGCDetails -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=20M** parameter. The GC log is configured by default. + +Procedure +--------- + +- Optimize GC. + + Adjust the ratio of tenured generation memory to young generation memory. In the **conf/flink-conf.yaml** configuration file on the client, add the **-XX:NewRatio** parameter to the **env.java.opts** configuration item. For example, **-XX:NewRatio=2** indicates that ratio of tenured generation memory to young generation memory is 2:1, that is, the young generation memory occupies one third and tenured generation memory occupies two thirds. + +- When developing Flink applications, optimize the partitioning or grouping operation of DataStream. + + - If partitioning causes data skew, partitions need to be optimized. + - Do not perform concurrent operations, because some operations, WindowAll for example, to DataStream do not support parallelism. + - Do not use set keyBy to string type. diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/optimizing_the_design_of_partitioning_method.rst b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/optimizing_the_design_of_partitioning_method.rst new file mode 100644 index 0000000..6dda2f6 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/optimizing_the_design_of_partitioning_method.rst @@ -0,0 +1,66 @@ +:original_name: mrs_01_1591.html + +.. _mrs_01_1591: + +Optimizing the Design of Partitioning Method +============================================ + +Scenarios +--------- + +The divide of tasks can be optimized by optimizing the partitioning method. If data skew occurs in a certain task, the whole execution process is delayed. Therefore, when designing the partitioning method, ensure that partitions are evenly assigned. + +Procedure +--------- + +Partitioning methods are as follows: + +- **Random partitioning**: randomly partitions data. + + .. code-block:: + + dataStream.shuffle(); + +- **Rebalancing (round-robin partitioning)**: evenly partitions data based on round-robin. The partitioning method is useful to optimize data with data skew. + + .. code-block:: + + dataStream.rebalance(); + +- **Rescaling**: assign data to downstream subsets in the form of round-robin. The partitioning method is useful if you want to deliver data from each parallel instance of a data source to subsets of some mappers without the using rebalance (), that is, the complete rebalance operation. + + .. code-block:: + + dataStream.rescale(); + +- **Broadcast**: broadcast data to all partitions. + + .. code-block:: + + dataStream.broadcast(); + +- **User-defined partitioning**: use a user-defined partitioner to select a target task for each element. The user-defined partitioning allows user to partition data based on a certain feature to achieve optimized task execution. + + The following is an example: + + .. code-block:: + + // fromElements builds simple Tuple2 stream + DataStream> dataStream = env.fromElements(Tuple2.of("hello",1), Tuple2.of("test",2), Tuple2.of("world",100)); + + // Defines the key value used for partitioning. Adding one to the value equals to the id. + Partitioner> strPartitioner = new Partitioner>() { + @Override + public int partition(Tuple2 key, int numPartitions) { + return (key.f0.length() + key.f1) % numPartitions; + } + }; + + // The Tuple2 data is used as the basis for partitioning. + + dataStream.partitionCustom(strPartitioner, new KeySelector, Tuple2>() { + @Override + public Tuple2 getKey(Tuple2 value) throws Exception { + return value; + } + }).print(); diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/summarization.rst b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/summarization.rst new file mode 100644 index 0000000..41c7780 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/summarization.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_1593.html + +.. _mrs_01_1593: + +Summarization +============= + +Avoiding Data Skew +------------------ + +If data skew occurs (certain data volume is large), the execution time of tasks is inconsistent even if no garbage collection is performed. + +- Redefine keys. Use keys of smaller granularity to optimize the task size. +- Modify the DOP. +- Call the rebalance operation to balance data partitions. + +Setting Timeout Interval for the Buffer +--------------------------------------- + +- During the execution of tasks, data is switched through network switching. You can configure the **setBufferTimeout** parameter to specify the timeout interval for the buffer. + +- If **setBufferTimeout** is set to **-1**, the refreshing operation is performed when the buffer full, maximizing the throughput. If **setBufferTimeout** is set to **0**, the refreshing operation is performed each time data is received, minimizing the delay. If **setBufferTimeout** is set to a value greater than **0**, the refreshing operation is performed after the butter times out. + + The following is an example: + + .. code-block:: + + env.setBufferTimeout(timeoutMillis); + + env.generateSequence(1,10).map(new MyMapper()).setBufferTimeout(timeoutMillis); diff --git a/doc/component-operation-guide-lts/source/using_flink/index.rst b/doc/component-operation-guide-lts/source/using_flink/index.rst new file mode 100644 index 0000000..3603e35 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/index.rst @@ -0,0 +1,36 @@ +:original_name: mrs_01_0591.html + +.. _mrs_01_0591: + +Using Flink +=========== + +- :ref:`Using Flink from Scratch ` +- :ref:`Viewing Flink Job Information ` +- :ref:`Flink Configuration Management ` +- :ref:`Security Configuration ` +- :ref:`Security Hardening ` +- :ref:`Security Statement ` +- :ref:`Using the Flink Web UI ` +- :ref:`Deleting Residual Information About Flink Tasks ` +- :ref:`Flink Log Overview ` +- :ref:`Flink Performance Tuning ` +- :ref:`Common Flink Shell Commands ` +- :ref:`Reference ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + using_flink_from_scratch + viewing_flink_job_information + flink_configuration_management/index + security_configuration/index + security_hardening/index + security_statement + using_the_flink_web_ui/index + deleting_residual_information_about_flink_tasks + flink_log_overview + flink_performance_tuning/index + common_flink_shell_commands + reference/index diff --git a/doc/component-operation-guide-lts/source/using_flink/reference/example_of_issuing_a_certificate.rst b/doc/component-operation-guide-lts/source/using_flink/reference/example_of_issuing_a_certificate.rst new file mode 100644 index 0000000..9bf1bfe --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/reference/example_of_issuing_a_certificate.rst @@ -0,0 +1,320 @@ +:original_name: mrs_01_0621.html + +.. _mrs_01_0621: + +Example of Issuing a Certificate +================================ + +Generate the **generate_keystore.sh** script based on the sample code and save the script to the **bin** directory on the Flink client. + +.. code-block:: + + #!/bin/bash + + KEYTOOL = ${JAVA_HOME}/bin/keytool + KEYSTOREPATH = "$FLINK_HOME/conf/" + CA_ALIAS = "ca" + CA_KEYSTORE_NAME = "ca.keystore" + CA_DNAME = "CN=Flink_CA" + CA_KEYALG = "RSA" + CLIENT_CONF_YAML = "$FLINK_HOME/conf/flink-conf.yaml" + KEYTABPRINCEPAL = "" + + function getConf() { + if [$# - ne 2]; + then + echo "invalid parmaters for getConf" + exit 1 + fi + + confName = "$1" + if [-z "$confName"]; + then + echo "conf name is empty." + exit 2 + fi + + configFile = $FLINK_HOME / conf / client.properties + if [!-f $configFile]; + then + echo $configFile " is not exist." + exit 3 + fi + + defaultValue = "$2" + cnt = $(grep $1 $configFile | wc - l) + if [$cnt - gt 1]; + then + echo $confName " has multi values in " + $configFile + exit 4 + elif[$cnt - lt 1]; + then + echo $defaultValue + else + line = $(grep $1 $configFile) + confValue = $(echo "${line#*=}") + echo "$confValue" + fi + } + + function createSelfSignedCA() {# + varible from user input + keystorePath = $1 + storepassValue = $2 + keypassValue = $3 + + # generate ca keystore + rm - rf $keystorePath / $CA_KEYSTORE_NAME + $KEYTOOL - genkeypair - alias $CA_ALIAS - keystore $keystorePath / $CA_KEYSTORE_NAME - dname $CA_DNAME - storepass $storepassValue - keypass $keypassValue - validity 3650 - keyalg $CA_KEYALG - keysize 3072 - ext bc = ca: true + if [$ ? -ne 0]; + then + echo "generate ca.keystore failed." + exit 1 + fi + + # generate ca.cer + rm - rf "$keystorePath/ca.cer" + $KEYTOOL - keystore "$keystorePath/$CA_KEYSTORE_NAME" - storepass "$storepassValue" - alias $CA_ALIAS - validity 3650 - exportcert > "$keystorePath/ca.cer" + if [$ ? -ne 0]; + then + echo "generate ca.cer failed." + exit 1 + fi + + # generate ca.truststore + rm - rf "$keystorePath/flink.truststore" + $KEYTOOL - importcert - keystore "$keystorePath/flink.truststore" - alias $CA_ALIAS - storepass "$storepassValue" - noprompt - file "$keystorePath/ca.cer" + if [$ ? -ne 0]; + then + echo "generate ca.truststore failed." + exit 1 + fi + } + + function generateKeystore() {# + get path / pass from input + keystorePath = $1 + storepassValue = $2 + keypassValue = $3 + + # get value from conf + aliasValue = $(getConf "flink.keystore.rsa.alias" + "flink") + validityValue = $(getConf "flink.keystore.rsa.validity" + "3650") + keyalgValue = $(getConf "flink.keystore.rsa.keyalg" + "RSA") + dnameValue = $(getConf "flink.keystore.rsa.dname" + "CN=flink.com") + SANValue = $(getConf "flink.keystore.rsa.ext" + "ip:127.0.0.1") + SANValue = $(echo "$SANValue" | xargs) + SANValue = "ip:$(echo " + $SANValue "| sed 's/,/,ip:/g')" + + # + generate keystore + rm - rf $keystorePath / flink.keystore + $KEYTOOL - genkeypair - alias $aliasValue - keystore $keystorePath / flink.keystore - dname $dnameValue - ext SAN = $SANValue - storepass $storepassValue - keypass $keypassValue - keyalg $keyalgValue - keysize 3072 - validity 3650 + if [$ ? -ne 0];then + echo "generate flink.keystore failed." + exit 1 + fi + + # generate cer + rm - rf $keystorePath / flink.csr + $KEYTOOL - certreq - keystore $keystorePath / flink.keystore - storepass $storepassValue - alias $aliasValue - file $keystorePath / flink.csr + if [$ ? -ne 0];then + echo "generate flink.csr failed." + exit 1 + fi + + # generate flink.cer + rm - rf $keystorePath / flink.cer + $KEYTOOL - gencert - keystore $keystorePath / ca.keystore - storepass $storepassValue - alias $CA_ALIAS - ext SAN = $SANValue - infile $keystorePath / flink.csr - outfile $keystorePath / flink.cer - validity 3650 + if [$ ? -ne 0];then + echo "generate flink.cer failed." + exit 1 + fi + + # + import cer into keystore + $KEYTOOL - importcert - keystore $keystorePath / flink.keystore - storepass $storepassValue - file $keystorePath / ca.cer - alias $CA_ALIAS - noprompt + if [$ ? -ne 0];then + echo "importcert ca." + exit 1 + fi + + $KEYTOOL - importcert - keystore $keystorePath / flink.keystore - storepass $storepassValue - file $keystorePath / flink.cer - alias $aliasValue - noprompt; + if [$ ? -ne 0];then + echo "generate flink.truststore failed." + exit 1 + fi + } + + function configureFlinkConf() {# + set config + if [-f "$CLIENT_CONF_YAML"];then + SSL_ENCRYPT_ENABLED = $(grep "security.ssl.encrypt.enabled" + "$CLIENT_CONF_YAML" | awk '{print $2}') + if ["$SSL_ENCRYPT_ENABLED" = "false"];then + + sed - i s / "security.ssl.key-password:".*/"security.ssl.key-password:"\ "${keyPass}"/g + "$CLIENT_CONF_YAML" + if [$ ? -ne 0];then + echo "set security.ssl.key-password failed." + return 1 + fi + + sed - i s / "security.ssl.keystore-password:".*/"security.ssl.keystore-password:"\ "${storePass}"/g + "$CLIENT_CONF_YAML" + if [$ ? -ne 0];then + echo "set security.ssl.keystore-password failed." + return 1 + fi + + sed - i s / "security.ssl.truststore-password:".*/"security.ssl.truststore-password:"\ "${storePass}"/g + "$CLIENT_CONF_YAML" + if [$ ? -ne 0];then + echo "set security.ssl.keystore-password failed." + return 1 + fi + + echo "security.ssl.encrypt.enabled is false, set security.ssl.key-password security.ssl.keystore-password security.ssl.truststore-password success." + else + echo "security.ssl.encrypt.enabled is true, please enter security.ssl.key-password security.ssl.keystore-password security.ssl.truststore-password encrypted value in flink-conf.yaml." + fi + + keystoreFilePath = "${keystorePath}" / flink.keystore + sed - i 's#' + "security.ssl.keystore:".*'#' + "security.ssl.keystore:"\ + "$keystoreFilePath" + '#g' + "$CLIENT_CONF_YAML" + if [$ ? -ne 0];then + echo "set security.ssl.keystore failed." + return 1 + fi + + + truststoreFilePath = "${keystorePath}/flink.truststore" + sed - i 's#' + "security.ssl.truststore:".*'#' + "security.ssl.truststore:"\ + "$truststoreFilePath" + '#g' + "$CLIENT_CONF_YAML" + if [$ ? -ne 0];then + echo "set security.ssl.truststore failed." + return 1 + fi + + command - v sha256sum > /dev/null + if [$ ? -ne 0];then + echo "sha256sum is not exist, it will produce security.cookie with date +%F-%H-%M-%s-%N." + cookie = $(date + % F - % H - % M - % s - % N) + else + cookie = "$(echo " + $ { + KEYTABPRINCEPAL + } + "| sha256sum | awk '{print $1}')" + fi + + sed - i s / "security.cookie:".*/"security.cookie:"\ "${cookie}"/g + "$CLIENT_CONF_YAML" + if [$ ? -ne 0];then + echo "set security.cookie failed." + return 1 + fi + fi + return 0; + } + + main() { + #check environment variable is set or not + if [-z $ { + FLINK_HOME + x + }]; + then + echo "errro: environment variables are not set." + exit 1 + fi + stty -echo + read -rp "Enter password:" + password + stty echo + echo + + KEYTABPRINCEPAL = $(grep "security.kerberos.login.principal" + "$CLIENT_CONF_YAML" | awk '{print $2}') + if [-z "$KEYTABPRINCEPAL"]; + then + echo "please config security.kerberos.login.principal info first." + exit 1 + fi + + + # get input + keystorePath = "$KEYSTOREPATH" + storePass = "$password" + keyPass = "$password" + + # + generate self signed CA + createSelfSignedCA "$keystorePath" + "$storePass" + "$keyPass" + if [$ ? -ne 0]; + then + echo "create self signed ca failed." + exit 1 + fi + + # generate keystore + generateKeystore "$keystorePath" + "$storePass" + "$keyPass" + if [$ ? -ne 0]; + then + echo "create keystore failed." + exit 1 + fi + + echo "generate keystore/truststore success." + + # + set flink config + configureFlinkConf "$keystorePath" + "$storePass" + "$keyPass" + if [$ ? -ne 0]; + then + echo "configure Flink failed." + exit 1 + fi + + return 0; + } + + # + the start main + main "$@" + + exit 0 + +.. note:: + + Run the **sh generate_keystore.sh** ** command. ** is user-defined. + + - If ** contains the special character **$**, use the following method to avoid the password being escaped: **sh generate_keystore.sh 'Bigdata_2013'**. + - The password cannot contain **#**. + - Before using the **generate_keystore.sh** script, run the **source bigdata_env** command in the client directory. + - When the **generate_keystore.sh** script is used, the absolute paths of **security.ssl.keystore** and **security.ssl.truststore** are automatically filled in **flink-conf.yaml**. Therefore, you need to manually change the paths to relative paths as required. Example: + + - Change **/opt/client/Flink/flink/conf//flink.keystore** to **security.ssl.keystore: ssl/flink.keystore**. + - Change **/opt/client/Flink/flink/conf//flink.truststore** to **security.ssl.truststore: ssl/flink.truststore**. + - Create the **ssl** folder in any directory on the Flink client. For example, create the **ssl** folder in the **/opt/client/Flink/flink/conf/** directory and save the **flink.keystore** and **flink.truststore** files to the **ssl** folder. + - When running the **yarn-session** or **flink run -m yarn-cluster** command, run the **yarn-session -t ssl -d** or **flink run -m yarn-cluster -yt ssl -d WordCount.jar** command in the same directory as the **ssl** folder. diff --git a/doc/component-operation-guide-lts/source/using_flink/reference/index.rst b/doc/component-operation-guide-lts/source/using_flink/reference/index.rst new file mode 100644 index 0000000..7a6a2ac --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/reference/index.rst @@ -0,0 +1,14 @@ +:original_name: mrs_01_0620.html + +.. _mrs_01_0620: + +Reference +========= + +- :ref:`Example of Issuing a Certificate ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + example_of_issuing_a_certificate diff --git a/doc/component-operation-guide-lts/source/using_flink/security_configuration/configuring_kafka.rst b/doc/component-operation-guide-lts/source/using_flink/security_configuration/configuring_kafka.rst new file mode 100644 index 0000000..5cdca23 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/security_configuration/configuring_kafka.rst @@ -0,0 +1,97 @@ +:original_name: mrs_01_1580.html + +.. _mrs_01_1580: + +Configuring Kafka +================= + +Sample project data of Flink is stored in Kafka. A user with Kafka permission can send data to Kafka and receive data from it. + +#. Ensure that clusters, including HDFS, Yarn, Flink, and Kafka are installed. + +#. Create a topic. + + - Run Linux command line to create a topic. Before running commands, ensure that the kinit command, for example, **kinit flinkuser**, is run for authentication. + + .. note:: + + To create a Flink user, you need to have the permission to create Kafka topics. + + The format of the command is shown as follows, in which **{zkQuorum}** indicates ZooKeeper cluster information and the format is *IP*:*port*, and **{Topic}** indicates the topic name. + + **bin/kafka-topics.sh --create --zookeeper {zkQuorum}/kafka --replication-factor 1 --partitions 5 --topic {Topic}** + + Assume the topic name is **topic 1**. The command for creating this topic is displayed as follows: + + .. code-block:: + + /opt/client/Kafka/kafka/bin/kafka-topics.sh --create --zookeeper 10.96.101.32:2181,10.96.101.251:2181,10.96.101.177:2181,10.91.8.160:2181/kafka --replication-factor 1 --partitions 5 --topic topic1 + + - Configure the permission of the topic on the server. + + Set the **allow.everyone.if.no.acl.found** parameter of Kafka Broker to **true**. + +#. Perform the security authentication. + + The Kerberos authentication, SSL encryption authentication, or Kerberos + SSL authentication mode can be used. + + - **Kerberos authentication** + + - Client configuration + + In the Flink configuration file **flink-conf.yaml**, add configurations about Kerberos authentication. For example, add **KafkaClient** in **contexts** as follows: + + .. code-block:: + + security.kerberos.login.keytab: /home/demo//keytab/flinkuser.keytab + security.kerberos.login.principal: flinkuser + security.kerberos.login.contexts: Client,KafkaClient + security.kerberos.login.use-ticket-cache: false + + - Running parameter + + Running parameters about the **SASL_PLAINTEXT** protocol are as follows: + + .. code-block:: + + --topic topic1 --bootstrap.servers 10.96.101.32:21007 --security.protocol SASL_PLAINTEXT --sasl.kerberos.service.name kafka //10.96.101.32:21007 indicates the IP:port of the Kafka server. + + - **SSL encryption** + + - Configuration on the server + + Log in to FusionInsight Manager, choose **Cluster** > **Services** > **Kafka** > **Configurations**, and set **Type** to **All**. Search for **ssl.mode.enable** and set it to **true**. + + - Configuration on the client + + a. Log in to FusionInsight Manager, choose **Cluster > Name of the desired cluster > Services > Kafka > More > Download Client** to download Kafka client. + + b. Use the **ca.crt** certificate file in the client root directory to generate the **truststore** file for the client. + + Run the following command: + + .. code-block:: + + keytool -noprompt -import -alias myservercert -file ca.crt -keystore truststore.jks + + The command execution result is similar to the following: + + |image1| + + c. Run parameters. + + The value of **ssl.truststore.password** must be the same as the password you entered when creating **truststore**. Run the following command to run parameters: + + .. code-block:: + + --topic topic1 --bootstrap.servers 10.96.101.32:9093 --security.protocol SSL --ssl.truststore.location /home/zgd/software/FusionInsight_Kafka_ClientConfig/truststore.jks --ssl.truststore.password XXX + + - **Kerberos+SSL** **encryption** + + After completing preceding configurations of the client and server of Kerberos and SSL, modify the port number and protocol type in running parameters to enable the Kerberos+SSL encryption mode. + + .. code-block:: + + --topic topic1 --bootstrap.servers 10.96.101.32:21009 --security.protocol SASL_SSL --sasl.kerberos.service.name kafka --ssl.truststore.location /home/zgd/software/FusionInsight_Kafka_ClientConfig/truststore.jks --ssl.truststore.password XXX + +.. |image1| image:: /_static/images/en-us_image_0000001349139389.png diff --git a/doc/component-operation-guide-lts/source/using_flink/security_configuration/configuring_pipeline.rst b/doc/component-operation-guide-lts/source/using_flink/security_configuration/configuring_pipeline.rst new file mode 100644 index 0000000..30f5c9e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/security_configuration/configuring_pipeline.rst @@ -0,0 +1,31 @@ +:original_name: mrs_01_1581.html + +.. _mrs_01_1581: + +Configuring Pipeline +==================== + +#. File configuration + + - **nettyconnector.registerserver.topic.storage**: (Mandatory) Configures the path (on a third-party server) to information about IP address, port numbers, and concurrency of NettySink. For example: + + .. code-block:: + + nettyconnector.registerserver.topic.storage: /flink/nettyconnector + + - **nettyconnector.sinkserver.port.range**: (Mandatory) Configures the range of port numbers of NettySink. For example: + + .. code-block:: + + nettyconnector.sinkserver.port.range: 28444-28843 + + - **nettyconnector.ssl.enabled**: Configures whether to enable SSL encryption between NettySink and NettySource. The default value is **false**. For example: + + .. code-block:: + + nettyconnector.ssl.enabled: true + +#. Security authentication configuration + + - SASL authentication of ZooKeeper depends on the HA configuration in the **flink-conf.yaml** file. + - SSL configurations such as keystore, truststore, keystore password, truststore password, and password inherit from **flink-conf.yaml**. For details, see :ref:`Encrypted Transmission `. diff --git a/doc/component-operation-guide-lts/source/using_flink/security_configuration/index.rst b/doc/component-operation-guide-lts/source/using_flink/security_configuration/index.rst new file mode 100644 index 0000000..b198aae --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/security_configuration/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_0593.html + +.. _mrs_01_0593: + +Security Configuration +====================== + +- :ref:`Security Features ` +- :ref:`Configuring Kafka ` +- :ref:`Configuring Pipeline ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + security_features + configuring_kafka + configuring_pipeline diff --git a/doc/component-operation-guide-lts/source/using_flink/security_configuration/security_features.rst b/doc/component-operation-guide-lts/source/using_flink/security_configuration/security_features.rst new file mode 100644 index 0000000..3f5d270 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/security_configuration/security_features.rst @@ -0,0 +1,24 @@ +:original_name: mrs_01_1579.html + +.. _mrs_01_1579: + +Security Features +================= + +Security Features of Flink +-------------------------- + +- All Flink cluster components support authentication. + + - The Kerberos authentication is supported between Flink cluster components and external components, such as Yarn, HDFS, and ZooKeeper. + - The security cookie authentication between Flink cluster components, for example, Flink client and JobManager, JobManager and TaskManager, and TaskManager and TaskManager, are supported. + +- SSL encrypted transmission is supported by Flink cluster components. +- SSL encrypted transmission between Flink cluster components, for example, Flink client and JobManager, JobManager and TaskManager, and TaskManager and TaskManager, are supported. +- Following security hardening approaches for Flink web are supported: + + - Whitelist filtering. Flink web can only be accessed through Yarn proxy. + - Security header enhancement. + +- In Flink clusters, ranges of listening ports of components can be configured. +- In HA mode, ACL control is supported. diff --git a/doc/component-operation-guide-lts/source/using_flink/security_hardening/acl_control.rst b/doc/component-operation-guide-lts/source/using_flink/security_hardening/acl_control.rst new file mode 100644 index 0000000..b34b8dc --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/security_hardening/acl_control.rst @@ -0,0 +1,15 @@ +:original_name: mrs_01_1584.html + +.. _mrs_01_1584: + +ACL Control +=========== + +In HA mode of Flink, ZooKeeper can be used to manage clusters and discover services. Zookeeper supports SASL ACL control. Only users who have passed the SASL (Kerberos) authentication have the permission to operate files on ZooKeeper. To enable SASL ACL control, perform following configurations in the Flink configuration file. + +.. code-block:: + + high-availability.zookeeper.client.acl: creator + zookeeper.sasl.disable: false + +For details about configuration items, see :ref:`Table 1 `. diff --git a/doc/component-operation-guide-lts/source/using_flink/security_hardening/authentication_and_encryption.rst b/doc/component-operation-guide-lts/source/using_flink/security_hardening/authentication_and_encryption.rst new file mode 100644 index 0000000..55cc6f9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/security_hardening/authentication_and_encryption.rst @@ -0,0 +1,196 @@ +:original_name: mrs_01_1583.html + +.. _mrs_01_1583: + +Authentication and Encryption +============================= + +Security Authentication +----------------------- + +Flink uses the following three authentication modes: + +- Kerberos authentication: It is used between the Flink Yarn client and Yarn ResourceManager, JobManager and ZooKeeper, JobManager and HDFS, TaskManager and HDFS, Kafka and TaskManager, as well as TaskManager and ZooKeeper. +- Security cookie authentication: Security cookie authentication is used between Flink Yarn client and JobManager, JobManager and TaskManager, as well as TaskManager and TaskManager. +- Internal authentication of Yarn: The Internal authentication mechanism of Yarn is used between Yarn ResourceManager and ApplicationMaster (AM). + + .. note:: + + - Flink JobManager and Yarn ApplicationMaster are in the same process. + - If Kerberos authentication is enabled for the user's cluster, Kerberos authentication is required. + + .. table:: **Table 1** Authentication modes + + +---------------------------------+----------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Authentication Mode | Description | Configuration Method | + +=================================+======================================================================+======================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | Kerberos authentication | Currently, only keytab authentication mode is supported. | #. Download the user keytab from the KDC server, and place the keytab to a directory on the host of the Flink client. | + | | | #. Configure the following parameters in the **flink-conf.yaml** file: | + | | | | + | | | a. Keytab path | + | | | | + | | | .. code-block:: | + | | | | + | | | security.kerberos.login.keytab: /home/flinkuser/keytab/abc222.keytab | + | | | | + | | | Note: | + | | | | + | | | **/home/flinkuser/keytab/abc222.keytab** indicates the user directory. | + | | | | + | | | b. Principal name | + | | | | + | | | .. code-block:: | + | | | | + | | | security.kerberos.login.principal: abc222 | + | | | | + | | | c. In HA mode, if ZooKeeper is configured, the Kerberos authentication configuration items must be configured as follows: | + | | | | + | | | .. code-block:: | + | | | | + | | | zookeeper.sasl.disable: false | + | | | security.kerberos.login.contexts: Client | + | | | | + | | | d. If you want to perform Kerberos authentication between Kafka client and Kafka broker, set the value as follows: | + | | | | + | | | .. code-block:: | + | | | | + | | | security.kerberos.login.contexts: Client,KafkaClient | + +---------------------------------+----------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Security cookie authentication | ``-`` | #. Generate the **generate_keystore.sh** script by referring to :ref:`Example of Issuing a Certificate ` and place it in the **bin** directory of the Flink client. In the **bin** directory of the Flink client, run the **generate_keystore.sh** script to generate security cookie, **flink.keystore**, and **flink.truststore**. | + | | | | + | | | Run the **sh generate_keystore.sh** command and enter the user-defined password. The password cannot contain **#**. | + | | | | + | | | .. note:: | + | | | | + | | | After the script is executed, the **flink.keystore** and **flink.truststore** files are generated in the **conf** directory on the Flink client. In the **flink-conf.yaml** file, default values are specified for following parameters: | + | | | | + | | | - Set **security.ssl.keystore** to the absolute path of the **flink.keystore** file. | + | | | - Set **security.ssl.truststore** to the absolute path of the **flink.truststore** file. | + | | | | + | | | - Set **security.cookie** to a random password automatically generated by the **generate_keystore.sh** script. | + | | | - By default, **security.ssl.encrypt.enabled: false** is set in the **flink-conf.yaml** file by default. The **generate_keystore.sh** script sets **security.ssl.key-password**, **security.ssl.keystore-password**, and **security.ssl.truststore-password** to the password entered when the **generate_keystore.sh** script is called. | + | | | | + | | | - If ciphertext is required and **security.ssl.encrypt.enabled: true**, is set in the **flink-conf.yaml** file, the **generate_keystore.sh** script does not set **security.ssl.key-password**, **security.ssl.keystore-password**, and **security.ssl.truststore-password**. To obtain the values, use the Manager plaintext encryption API by running the following command: **curl -k -i -u** *Username*\ **:**\ *Password* **-X POST -HContent-type:application/json -d '{"plainText":"**\ *Password*\ **"}' 'https://**\ *x.x.x.x*\ **:28443/web/api/v2/tools/encrypt'** | + | | | | + | | | In the preceding command, *Username*\ **:**\ *Password* indicates the user name and password for logging in to the system. The password of **"plainText"** indicates the one used to call the **generate_keystore.sh** script. *x.x.x.x* indicates the floating IP address of Manager. | + | | | | + | | | #. Set **security.enable: true** in the **flink-conf.yaml** file and check whether **security cookie** is configured successfully. Example: | + | | | | + | | | .. code-block:: | + | | | | + | | | security.cookie: ae70acc9-9795-4c48-ad35-8b5adc8071744f605d1d-2726-432e-88ae-dd39bfec40a9 | + | | | | + | | | .. note:: | + | | | | + | | | Obtain the SSL certificate and save it to the Flink client. For details, see :ref:`Example of Issuing a Certificate `. | + +---------------------------------+----------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Internal authentication of Yarn | This authentication mode does not need to be configured by the user. | ``-`` | + +---------------------------------+----------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. note:: + + One Flink cluster supports only one user. One user can create multiple Flink clusters. + +.. _mrs_01_1583__en-us_topic_0000001219029049_section270112348585: + +Encrypted Transmission +---------------------- + +Flink uses following encrypted transmission modes: + +- Encrypted transmission inside Yarn: It is used between the Flink Yarn client and Yarn ResourceManager, as well as Yarn ResourceManager and JobManager. +- SSL transmission: SSL transmission is used between Flink Yarn client and JobManager, JobManager and TaskManager, as well as TaskManager and TaskManager. +- Encrypted transmission inside Hadoop: The internal encrypted transmission mode of Hadoop used between JobManager and HDFS, TaskManager and HDFS, JobManager and ZooKeeper, as well as TaskManager and ZooKeeper. + +.. note:: + + Configuration about SSL encrypted transmission is mandatory while configuration about encryption of Yarn and Hadoop is not required. + +To configure SSL encrypted transmission, configure the following parameters in the **flink-conf.yaml** file on the client: + +#. Enable SSL and configure the SSL encryption algorithm. see :ref:`Table 2 `. Modify the parameters as required. + + .. _mrs_01_1583__en-us_topic_0000001219029049_table4164102001915: + + .. table:: **Table 2** Parameter description + + +------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+ + | Parameter | Example Value | Description | + +==============================+=====================================================================================================================================================+================================================+ + | security.ssl.enabled | true | Enable SSL. | + +------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+ + | akka.ssl.enabled | true | Enable Akka SSL. | + +------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+ + | blob.service.ssl.enabled | true | Enable SSL for the Blob channel. | + +------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+ + | taskmanager.data.ssl.enabled | true | Enable SSL transmissions between TaskManagers. | + +------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+ + | security.ssl.algorithms | TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 | Configure the SSL encryption algorithm. | + +------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+ + + .. note:: + + Enabling SSL for data transmission between TaskManagers may pose great impact on the system performance. + +#. Generate the **generate_keystore.sh** script by referring to :ref:`Example of Issuing a Certificate ` and place it in the **bin** directory of the Flink client. In the **bin** directory of the Flink client, run the **sh generate_keystore.sh** ** command. For details, see :ref:`Authentication and Encryption `. The configuration items in :ref:`Table 3 ` are set by default. You can also configure them manually. + + .. _mrs_01_1583__en-us_topic_0000001219029049_table5150181111227: + + .. table:: **Table 3** Parameter description + + +----------------------------------+--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Example Value | Description | + +==================================+==========================+===========================================================================================================================================================+ + | security.ssl.keystore | ${path}/flink.keystore | Path for storing the **keystore**. **flink.keystore** indicates the name of the **keystore** file generated by the **generate_keystore.sh\*** tool. | + +----------------------------------+--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | security.ssl.keystore-password | 123456 | Password of the **keystore**. **123456** indicates a user-defined password is required. | + +----------------------------------+--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | security.ssl.key-password | 123456 | Password of the SSL key. **123456** indicates a user-defined password is required. | + +----------------------------------+--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | security.ssl.truststore | ${path}/flink.truststore | Path for storing the **truststore**. **flink.truststore** indicates the name of the **truststore** file generated by the **generate_keystore.sh\*** tool. | + +----------------------------------+--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | security.ssl.truststore-password | 123456 | Password of the **truststore**. **123456** indicates a user-defined password is required. | + +----------------------------------+--------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. note:: + + The **path** directory is a user-defined directory for storing configuration files of the SSL keystore and truststore. The commands vary according to the relative path and absolute path. For details, see :ref:`3 ` and :ref:`4 `. + +#. .. _mrs_01_1583__en-us_topic_0000001219029049_li02291947181712: + + If the **keystore** or **truststore** file path is a relative path, the Flink client directory where the command is executed needs to access this relative path directly. Either of the following method can be used to transmit the keystore and truststore file: + + - Add **-t** option to the **CLI yarn-session.sh** command to transfer the **keystore** and **truststore** file to execution nodes. Example: + + .. code-block:: + + ./bin/yarn-session.sh -t ssl/ + + - Add **-yt** option to the **flink run** command to transfer the **keystore** and **truststore** file to execution nodes. Example: + + .. code-block:: + + ./bin/flink run -yt ssl/ -ys 3 -m yarn-cluster -c org.apache.flink.examples.java.wordcount.WordCount /opt/client/Flink/flink/examples/batch/WordCount.jar + + .. note:: + + - In the preceding example, **ssl/** is the sub-directory of the Flink client directory. It is used to store configuration files of the SSL keystore and truststore. + - The relative path of **ssl/** must be accessible from the current path where the Flink client command is run. + +#. .. _mrs_01_1583__en-us_topic_0000001219029049_li15533111081818: + + If the keystore or truststore file path is an absolute path, the keystore and truststore files must exist in the absolute path on Flink Client and all nodes. + + Either of the following methods can be used to execute applications. The **-t** or **-yt** option does not need to be added to transmit the **keystore** and **truststore** files. + + - Run the **CLI yarn-session.sh** command of Flink to execute applications. Example: + + .. code-block:: + + ./bin/yarn-session.sh + + - Run the **Flink run** command to execute applications. Example: + + .. code-block:: + + ./bin/flink run -ys 3 -m yarn-cluster -c org.apache.flink.examples.java.wordcount.WordCount /opt/client/Flink/flink/examples/batch/WordCount.jar diff --git a/doc/component-operation-guide-lts/source/using_flink/security_hardening/index.rst b/doc/component-operation-guide-lts/source/using_flink/security_hardening/index.rst new file mode 100644 index 0000000..ee3314b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/security_hardening/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_0594.html + +.. _mrs_01_0594: + +Security Hardening +================== + +- :ref:`Authentication and Encryption ` +- :ref:`ACL Control ` +- :ref:`Web Security ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + authentication_and_encryption + acl_control + web_security diff --git a/doc/component-operation-guide-lts/source/using_flink/security_hardening/web_security.rst b/doc/component-operation-guide-lts/source/using_flink/security_hardening/web_security.rst new file mode 100644 index 0000000..298acc1 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/security_hardening/web_security.rst @@ -0,0 +1,103 @@ +:original_name: mrs_01_1585.html + +.. _mrs_01_1585: + +Web Security +============ + +Coding Specifications +--------------------- + +Note: The same coding mode is used on the web service client and server to prevent garbled characters and to enable input verification. + +Security hardening: apply UTF-8 to response messages of web server. + +Whitelist-based Filter of IP Addresses +-------------------------------------- + +Note: IP filter must be added to the web server to filter unauthorized requests from the source IP address and prevent unauthorized login. + +Security: Add **jobmanager.web.allow-access-address** to enable the IP filter. By default, only Yarn users are supported. + +.. note:: + + After the client is installed, you need to add the IP address of the client node to the **jobmanager.web.allow-access-address** configuration item. + +Preventing Sending the Absolute Paths to the Client +--------------------------------------------------- + +Note: If an absolute path is sent to a client, the directory structure of the server is exposed, increasing the risk that attackers know and attack the system. + +Security hardening: If the Flink configuration file contains a parameter starting with a slash (/), the first-level directory is deleted. + +Same-origin Policy +------------------ + +If two URL protocols have same hosts and ports, they are of the same origin. Protocols of different origins cannot access each other, unless the source of the visitor is specified on the host of the service to be visited. + +Security hardening: The default value of the header of the response header **Access-Control-Allow-Origin** is the IP address of ResourceManager on Yarn clusters. If the IP address is not from Yarn, mutual access is not allowed. + +Preventing Sensitive Information Disclosure +------------------------------------------- + +Web pages containing sensitive data must not be cached, to avoid leakage of sensitive information or data crosstalk among users who visit the internet through the proxy server. + +Security hardening: Add **Cache-control**, **Pragma**, **Expires** security header. The default value is **Cache-Control: no-store**, **Pragma: no-cache**, and **Expires: 0**. + +The security hardening stops contents interacted between Flink and web server from being cached. + +Anti-Hijacking +-------------- + +Since hotlinking and clickjacking use framing technologies, security hardening is required to prevent attacks. + +Security hardening: Add **X-Frame-Options** security header to specify whether the browser will load the pages from **iframe**, **frame** or **object**. The default value is **X-Frame-Options: DENY**, indicating that no pages can be nested to **iframe**, **frame** or **object**. + +Logging calls of the Web Service APIs +------------------------------------- + +Calls of the **Flink webmonitor restful** APIs are logged. + +The **jobmanager.web.accesslog.enable** can be added in the **access log**. The default value is **true**. Logs are stored in a separate **webaccess.log** file. + +Cross-Site Request Forgery Prevention +------------------------------------- + +In **Browser/Server** applications, CSRF must be prevented for operations involving server data modification, such as adding, modifying, and deleting. The CSRF forces end users to execute non-intended operations on the current web application. + +Security hardening: Only two post APIs, one delete API, and get interfaces are reserve for modification requests. All other APIs are deleted. + +Troubleshooting +--------------- + +When the application is abnormal, exception information is filtered, logged, and returned to the client. + +Security hardening + +- A default error message page to filter information and log detailed error information. +- Four configuration parameters are added to ensure that the error page is switched to a specified URL provided by FusionInsight, preventing exposure of unnecessary information. + + .. table:: **Table 1** Parameter description + + +---------------------------------+--------------------------------------------------------------------------------------+---------------+-----------+ + | Parameter | Description | Default Value | Mandatory | + +=================================+======================================================================================+===============+===========+ + | jobmanager.web.403-redirect-url | Web page access error 403. If 403 error occurs, the page switch to a specified page. | ``-`` | Yes | + +---------------------------------+--------------------------------------------------------------------------------------+---------------+-----------+ + | jobmanager.web.404-redirect-url | Web page access error 404. If 404 error occurs, the page switch to a specified page. | ``-`` | Yes | + +---------------------------------+--------------------------------------------------------------------------------------+---------------+-----------+ + | jobmanager.web.415-redirect-url | Web page access error 415. If 415 error occurs, the page switch to a specified page. | ``-`` | Yes | + +---------------------------------+--------------------------------------------------------------------------------------+---------------+-----------+ + | jobmanager.web.500-redirect-url | Web page access error 500. If 500 error occurs, the page switch to a specified page. | ``-`` | Yes | + +---------------------------------+--------------------------------------------------------------------------------------+---------------+-----------+ + +HTML5 Security +-------------- + +HTML5 is a next generation web development specification that provides new functions and extend the labels for developers. These new labels add the attack surfaces and may incur risks of attacks. For example, cross-domain resource sharing, storage on the client, WebWorker, WebRTC, and WebSocket. + +Security hardening: Add the **Access-Control-Allow-Origin** parameter. For example, if you want to enable the cross-domain resource sharing, configure the **Access-Control-Allow-Origin** parameter of the HTTP response header. + +.. note:: + + Flink does not involve security risks of functions such as storage on the client, WebWorker, WebRTC, and WebSocket. diff --git a/doc/component-operation-guide-lts/source/using_flink/security_statement.rst b/doc/component-operation-guide-lts/source/using_flink/security_statement.rst new file mode 100644 index 0000000..cbbd394 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/security_statement.rst @@ -0,0 +1,11 @@ +:original_name: mrs_01_1586.html + +.. _mrs_01_1586: + +Security Statement +================== + +- All security functions of Flink are provided by the open source community or self-developed. Security features that need to be configured by users, such as authentication and SSL encrypted transmission, may affect performance. +- As a big data computing and analysis platform, Flink does not detect sensitive information. Therefore, you need to ensure that the input data is not sensitive. +- You can evaluate whether configurations are secure as required. +- For any security-related problems, contact O&M support. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_flink_from_scratch.rst b/doc/component-operation-guide-lts/source/using_flink/using_flink_from_scratch.rst new file mode 100644 index 0000000..691e72f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_flink_from_scratch.rst @@ -0,0 +1,191 @@ +:original_name: mrs_01_0473.html + +.. _mrs_01_0473: + +Using Flink from Scratch +======================== + +Scenario +-------- + +This section describes how to use Flink to run wordcount jobs. + +Prerequisites +------------- + +- Flink has been installed in the MRS cluster and all components in the cluster are running properly. +- The cluster client has been installed, for example, in the **/opt/hadoopclient** directory. + +Procedure +--------- + +#. Log in to the node where the client is installed as the client installation user. + +#. Run the following commands to go to the client installation directory. + + **cd /opt/hadoopclient** + +#. Run the following command to initialize environment variables: + + **source /opt/hadoopclient/bigdata_env** + +#. If Kerberos authentication is enabled for the cluster, perform the following substeps. If Kerberos authentication is not enabled for the cluster, skip the following substeps. + + a. Create a user, for example, **test**, for submitting Flink jobs. + + After a human-machine user is created, log in to FusionInsight Manager as the new user and change the initial password as prompted. + + .. note:: + + To submit or run jobs on Flink, the user must have the following permissions: + + - If Ranger authentication is enabled, the current user must belong to the **hadoop** group or the user has been granted the **/flink** read and write permissions in Ranger. + - If Ranger authentication is disabled, the current user must belong to the **hadoop** group. + + b. Log in to FusionInsight Manager and choose **System > Permission > User**. On the displayed page, locate the row that contains the added user, click **More** in the **Operation** column, and select **Download Authentication Credential** to download the authentication credential file of the user to the local PC and decompress the file. + + c. Copy the decompressed **user.keytab** and **krb5.conf** files to the **/opt/hadoopclient/Flink/flink/conf** directory on the client node. + + d. Log in to the client node and add the service IP address of the client node and the floating IP address of FusionInsight Manager to the **jobmanager.web.allow-access-address** configuration item in the **/opt/hadoopclient/Flink/flink/conf/flink-conf.yaml** file. Use commas (,) to separate the IP addresses. + + **vi /opt/hadoopclient/Flink/flink/conf/flink-conf.yaml** + + e. Configure security authentication. + + Add the **keytab** path and username to the **/opt/hadoopclient/Flink/flink/conf/flink-conf.yaml** configuration file. + + .. code-block:: + + security.kerberos.login.keytab: + security.kerberos.login.principal: + + Example: + + .. code-block:: + + security.kerberos.login.keytab: /opt/hadoopclient/Flink/flink/conf/user.keytab + security.kerberos.login.principal: test + + f. Configure security hardening by referring to :ref:`Authentication and Encryption `. Run the following commands to set a password for submitting jobs. + + **cd /opt/hadoopclient/Flink/flink/bin** + + **sh generate_keystore.sh** + + The script automatically changes the SSL-related parameter values in the **/opt/hadoopclient/Flink/flink/conf/flink-conf.yaml** file. + + g. Configure paths for the client to access the **flink.keystore** and **flink.truststore** files. + + - Absolute path + + After the **generate_keystore.sh** script is executed, the **flink.keystore** and **flink.truststore** file paths are automatically set to absolute paths in the **flink-conf.yaml** file by default. In this case, you need to place the **flink.keystore** and **flink.truststore** files in the **conf** directory to the absolute paths of the Flink client and each Yarn node, respectively. + + - Relative path (recommended) + + Perform the following steps to set the file paths of **flink.keystore** and **flink.truststore** to relative paths and ensure that the directory where the Flink client command is executed can directly access the relative paths. + + #. Create a directory, for example, **ssl**, in **/opt/hadoopclient/Flink/flink/conf/**. + + **cd /opt/hadoopclient/Flink/flink/conf/** + + **mkdir ssl** + + #. Move the **flink.keystore** and **flink.truststore** files to the new paths. + + **mv flink.keystore ssl/** + + **mv flink.truststore ssl/** + + #. Change the values of the following parameters to relative paths in the **flink-conf.yaml** file: + + **vi /opt/hadoopclient/Flink/flink/conf/flink-conf.yaml** + + .. code-block:: + + security.ssl.keystore: ssl/flink.keystore + security.ssl.truststore: ssl/flink.truststore + +#. Run a wordcount job. + + - Normal cluster (Kerberos authentication disabled) + + - Run the following commands to start a session and submit a job in the session: + + **yarn-session.sh -nm "**\ *session-name*\ **"** + + **flink run /opt/hadoopclient/Flink/flink/examples/streaming/WordCount.jar** + + - Run the following command to submit a single job on Yarn: + + **flink run -m yarn-cluster /opt/hadoopclient/Flink/flink/examples/streaming/WordCount.jar** + + - Security cluster (Kerberos authentication enabled) + + - If the **flink.keystore** and **flink.truststore** file paths are relative paths: + + - Run the following command in the directory at the same level as **ssl** to start the session and submit the job in the session. **ssl/** is a relative path. + + **cd /opt/hadoopclient/Flink/flink/conf/** + + **yarn-session.sh -t ssl/ -nm "**\ *session-name*\ **"** + + .. code-block:: + + ... + Cluster started: Yarn cluster with application id application_1624937999496_0017 + JobManager Web Interface: http://192.168.1.150:32261 + + Start a new client connection and submit the job: + + **source /opt/hadoopclient/bigdata_env** + + **flink run /opt/hadoopclient/Flink/flink/examples/streaming/WordCount.jar** + + .. code-block:: + + ... + Job has been submitted with JobID 587d5498fff18d8b2501fdf7ebb9c4fb + Program execution finished + Job with JobID 587d5498fff18d8b2501fdf7ebb9c4fb has finished. + Job Runtime: 19917 ms + + - Run the following command to submit a single job on Yarn: + + **cd /opt/hadoopclient/Flink/flink/conf/** + + **flink run -m yarn-cluster -yt ssl/ /opt/hadoopclient/Flink/flink/examples/streaming/WordCount.jar** + + .. code-block:: + + ... + Cluster started: Yarn cluster with application id application_1624937999496_0016 + Job has been submitted with JobID e9c59fb48f44feae7b62dd90336d6d7f + Program execution finished + Job with JobID e9c59fb48f44feae7b62dd90336d6d7f has finished. + Job Runtime: 18155 ms + + - If the **flink.keystore** and **flink.truststore** file paths are absolute paths: + + - Run the following commands to start a session and submit a job in the session: + + **cd /opt/hadoopclient/Flink/flink/conf/** + + **yarn-session.sh -nm "**\ *session-name*\ **"** + + **flink run /opt/hadoopclient/Flink/flink/examples/streaming/WordCount.jar** + + - Run the following command to submit a single job on Yarn: + + **flink run -m yarn-cluster /opt/hadoopclient/Flink/flink/examples/streaming/WordCount.jar** + +#. Log in to FusionInsight Manager as a running user, go to the native page of the Yarn service, find the application of the corresponding job, and click the application name to go to the job details page. + + - If the job is not completed, click **Tracking URL** to go to the native Flink page and view the job running information. + + - If the job submitted in a session has been completed, you can click **Tracking URL** to log in to the native Flink service page to view job information. + + + .. figure:: /_static/images/en-us_image_0000001349058841.png + :alt: **Figure 1** Application + + **Figure 1** Application diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/accessing_the_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/accessing_the_flink_web_ui.rst new file mode 100644 index 0000000..70237c8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/accessing_the_flink_web_ui.rst @@ -0,0 +1,41 @@ +:original_name: mrs_01_24019.html + +.. _mrs_01_24019: + +Accessing the Flink Web UI +========================== + +Scenario +-------- + +After Flink is installed in an MRS cluster, you can connect to clusters and data as well as manage stream tables and jobs using the Flink web UI. + +This section describes how to access the Flink web UI in an MRS cluster. + +.. note:: + + You are advised to use Google Chrome 50 or later to access the Flink web UI. The Internet Explorer may be incompatible with the Flink web UI. + +Impact on the System +-------------------- + +Site trust must be added to the browser when you access Manager and the Flink web UI for the first time. Otherwise, the Flink web UI cannot be accessed. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a user with **FlinkServer Admin Privilege**. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > **Services** > **Flink**. + +#. On the right of **Flink WebUI**, click the link to access the Flink web UI. + + The Flink web UI provides the following functions: + + - System management: + + - Cluster connection management allows you to create, view, edit, test, and delete a cluster connection. + - Data connection management allows you to create, view, edit, test, and delete a data connection. Data connection types include HDFS, Kafka. + - Application management allows you to create, view, and delete an application. + + - UDF management allows you to upload and manage UDF JAR packages and customize functions to extend SQL statements to meet personalized requirements. + - Stream table management allows you to create, view, edit, and delete a stream table. + - Job management allows you to create, view, start, develop, edit, stop, and delete a job. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/creating_a_cluster_connection_on_the_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/creating_a_cluster_connection_on_the_flink_web_ui.rst new file mode 100644 index 0000000..80770b3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/creating_a_cluster_connection_on_the_flink_web_ui.rst @@ -0,0 +1,91 @@ +:original_name: mrs_01_24021.html + +.. _mrs_01_24021: + +Creating a Cluster Connection on the Flink Web UI +================================================= + +Scenario +-------- + +Different clusters can be accessed by configuring the cluster connection. + +.. _mrs_01_24021__en-us_topic_0000001173789648_section878113401693: + +Creating a Cluster Connection +----------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. + +#. Choose **System Management** > **Cluster Connection Management**. The **Cluster Connection Management** page is displayed. + +#. Click **Create Cluster Connection**. On the displayed page, set parameters by referring to :ref:`Table 1 ` and click **OK**. + + .. _mrs_01_24021__en-us_topic_0000001173789648_table134890201518: + + .. table:: **Table 1** Parameters for creating a cluster connection + + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+============================================================================================================================================================================================+ + | Cluster Connection Name | Name of the cluster connection, which can contain a maximum of 100 characters. Only letters, digits, and underscores (_) are allowed. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Description | Description of the cluster connection name. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | FusionInsight HD Version | Set a cluster version. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Secure Version | - If the secure version is used, select **Yes** for a security cluster. Enter the username and upload the user credential. | + | | - If not, select **No**. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Username | The user must have the minimum permissions for accessing services in the cluster. The name can contain a maximum of 100 characters. Only letters, digits, and underscores (_) are allowed. | + | | | + | | This parameter is available only when **Secure Version** is set to **Yes**. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Client Profile | Client profile of the cluster, in TAR format. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | User Credential | User authentication credential in FusionInsight Manager in TAR format. | + | | | + | | This parameter is available only when **Secure Version** is set to **Yes**. | + | | | + | | Files can be uploaded only after the username is entered. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. note:: + + To obtain the cluster client configuration files, perform the following steps: + + a. Log in to FusionInsight Manager and choose **Cluster** > **Dashboard**. + b. Choose **More** > **Download Client** > **Configuration Files Only**, select a platform type, and click **OK**. + + To obtain the user credential, perform the following steps: + + a. Log in to FusionInsight Manager and click **System**. + b. In the **Operation** column of the user, choose **More** > **Download Authentication Credential**, select a cluster, and click **OK**. + +Editing a Cluster Connection +---------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Choose **System Management** > **Cluster Connection Management**. The **Cluster Connection Management** page is displayed. +#. In the **Operation** column of the item to be modified, click **Edit**. On the displayed page, modify the connection information by referring to :ref:`Table 1 ` and click **OK**. + +Testing a Cluster Connection +---------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Choose **System Management** > **Cluster Connection Management**. The **Cluster Connection Management** page is displayed. +#. In the **Operation** column of the item to be tested, click **Test**. + +Searching for a Cluster Connection +---------------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Choose **System Management** > **Cluster Connection Management**. The **Cluster Connection Management** page is displayed. +#. In the upper right corner of the page, you can enter a search criterion to search for and view the cluster connection based on **Cluster Connection Name**. + +Deleting a Cluster Connection +----------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Choose **System Management** > **Cluster Connection Management**. The **Cluster Connection Management** page is displayed. +#. In the **Operation** column of the item to be deleted, click **Delete**, and click **OK** in the displayed page. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/creating_a_data_connection_on_the_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/creating_a_data_connection_on_the_flink_web_ui.rst new file mode 100644 index 0000000..6afea65 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/creating_a_data_connection_on_the_flink_web_ui.rst @@ -0,0 +1,73 @@ +:original_name: mrs_01_24022.html + +.. _mrs_01_24022: + +Creating a Data Connection on the Flink Web UI +============================================== + +Scenario +-------- + +Different data services can be accessed through data connections. Currently, FlinkServer supports HDFS, Kafka data connections. + +Creating a Data Connection +-------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. + +#. Choose **System Management** > **Data Connection Management**. The **Data Connection Management** page is displayed. + +#. Click **Create Data Connection**. On the displayed page, select a data connection type, enter information by referring to :ref:`Table 1 `, and click **OK**. + + .. _mrs_01_24022__en-us_topic_0000001219230571_table134890201518: + + .. table:: **Table 1** Parameters for creating a data connection + + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + | Parameter | Description | Example Value | + +=======================+=============================================================================================================================================================+=====================================+ + | Data Connection Type | Type of the data connection, which can be **HDFS**, **Kafka**. | ``-`` | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + | Data Connection Name | Name of the data connection, which can contain a maximum of 100 characters. Only letters, digits, and underscores (_) are allowed. | ``-`` | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + | Cluster Connection | Cluster connection name in configuration management. | ``-`` | + | | | | + | | This parameter is mandatory for HDFS data connections. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + | Kafka broker | Connection information about Kafka broker instances. The format is *IP address*:*Port number*. Use commas (,) to separate multiple instances. | 192.168.0.1:21005,192.168.0.2:21005 | + | | | | + | | This parameter is mandatory for Kafka data connections. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + | Authentication Mode | - **SIMPLE**: indicates that the connected service is in non-security mode and does not need to be authenticated. | ``-`` | + | | - **KERBEROS**: indicates that the connected service is in security mode and the Kerberos protocol for security authentication is used for authentication. | | + | | | | + | | This parameter is mandatory for Kafka data connections. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + +Editing a Data Connection +------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Choose **System Management** > **Data Connection Management**. The **Data Connection Management** page is displayed. +#. In the **Operation** column of the item to be modified, click **Edit**. On the displayed page, modify the connection information by referring to :ref:`Table 1 ` and click **OK**. + +Testing a Data Connection +------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Choose **System Management** > **Data Connection Management**. The **Data Connection Management** page is displayed. +#. In the **Operation** column of the item to be tested, click **Test**. + +Searching for a Data Connection +------------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Choose **System Management** > **Data Connection Management**. The **Data Connection Management** page is displayed. +#. In the upper right corner of the page, you can search for a data connection by name. + +Deleting a Data Connection +-------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Choose **System Management** > **Data Connection Management**. The **Data Connection Management** page is displayed. +#. In the **Operation** column of the item to be deleted, click **Delete**, and click **OK** in the displayed page. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/creating_an_application_on_the_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/creating_an_application_on_the_flink_web_ui.rst new file mode 100644 index 0000000..4dec868 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/creating_an_application_on_the_flink_web_ui.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_24020.html + +.. _mrs_01_24020: + +Creating an Application on the Flink Web UI +=========================================== + +Scenario +-------- + +Applications can be used to isolate different upper-layer services. + +Creating an Application +----------------------- + +#. Access the Flink web UI as a user with the FlinkServer management permission. For details, see :ref:`Accessing the Flink Web UI `. + +#. Choose **System Management** > **Application Management**. + +#. Click **Create Application**. On the displayed page, set parameters by referring to :ref:`Table 1 ` and click **OK**. + + .. _mrs_01_24020__en-us_topic_0000001219149251_table2048293612324: + + .. table:: **Table 1** Parameters for creating an application + + +-------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +=============+================================================================================================================================================+ + | Application | Name of the application to be created. The name can contain a maximum of 32 characters. Only letters, digits, and underscores (_) are allowed. | + +-------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | Description | Description of the application to be created. The value can contain a maximum of 85 characters. | + +-------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + + After the application is created, you can switch to the application to be operated in the upper left corner of the Flink web UI and develop jobs. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/authentication_based_on_users_and_roles.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/authentication_based_on_users_and_roles.rst new file mode 100644 index 0000000..41f8398 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/authentication_based_on_users_and_roles.rst @@ -0,0 +1,48 @@ +:original_name: mrs_01_24049.html + +.. _mrs_01_24049: + +Authentication Based on Users and Roles +======================================= + +This section describes how to create and configure a FlinkServer role on Manager as the system administrator. A FlinkServer role can be configured with FlinkServer administrator permission and the permissions to edit and view applications. + +You need to set permissions for the specified user in FlinkServer so that they can update, query, and delete data. + +Prerequisites +------------- + +The system administrator has planned permissions based on business needs. + +Procedure +--------- + +#. Log in to Manager. + +#. Choose **System** > **Permission** > **Role**. + +#. On the displayed page, click **Create Role** and specify **Role Name** and **Description**. + +#. Set **Configure Resource Permission**. + + FlinkServer permissions are as follows: + + - **FlinkServer Admin Privilege**: highest-level permission. Users with the permission can perform service operations on all FlinkServer applications. + - **FlinkServer Application**: Users can set **application view** and **applications management** permissions on applications. + + .. table:: **Table 1** Setting a role + + +------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario | Role Authorization | + +================================================+==============================================================================================================================================+ + | Setting the administrator operation permission | In the **Configure Resource Permission** table, choose *Name of the desired cluster* > **Flink** and select **FlinkServer Admin Privilege**. | + +------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting a specified permission on applications | a. In the **Configure Resource Permission** table, choose *Name of the desired cluster* > **Flink** > **FlinkServer Application**. | + | | b. In the **Permission** column, select **application view** or **applications management**. | + +------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Click **OK**. Return to role management page. + + .. note:: + + After the FlinkServer is created, create a FlinkServer user and bind the user to the role and user group. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/index.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/index.rst new file mode 100644 index 0000000..cad4bc7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_24047.html + +.. _mrs_01_24047: + +FlinkServer Permissions Management +================================== + +- :ref:`Overview ` +- :ref:`Authentication Based on Users and Roles ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + overview + authentication_based_on_users_and_roles diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/overview.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/overview.rst new file mode 100644 index 0000000..18af61c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/overview.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_24048.html + +.. _mrs_01_24048: + +Overview +======== + +User **admin** of Manager does not have the FlinkServer service operation permission. To perform FlinkServer service operations, you need to grant related permission to the user. + +Applications (tenants) in FlinkServer are the maximum management scope, including cluster connection management, data connection management, application management, stream table management, and job management. + +There are three types of resource permissions for FlinkServer, as shown in :ref:`Table 1 `. + +.. _mrs_01_24048__en-us_topic_0000001173949576_table663518214115: + +.. table:: **Table 1** FlinkServer resource permissions + + +--------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Name | Description | Remarks | + +======================================+=========================================================================================================================================================================+====================================================================================================================================================================+ + | FlinkServer administrator permission | Users who have the permission can edit and view all applications. | This is the highest-level permission of FlinkServer. If you have the FlinkServer administrator permission, you have the permission on all applications by default. | + +--------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Application edit permission | Users who have the permission can create, edit, and delete cluster connections and data connections. They can also create stream tables as well as create and run jobs. | In addition, users who have the permission can view current applications. | + +--------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Application view permission | Users who have the permission can view applications. | ``-`` | + +--------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/index.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/index.rst new file mode 100644 index 0000000..c661bd5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/index.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_24014.html + +.. _mrs_01_24014: + +Using the Flink Web UI +====================== + +- :ref:`Overview ` +- :ref:`FlinkServer Permissions Management ` +- :ref:`Accessing the Flink Web UI ` +- :ref:`Creating an Application on the Flink Web UI ` +- :ref:`Creating a Cluster Connection on the Flink Web UI ` +- :ref:`Creating a Data Connection on the Flink Web UI ` +- :ref:`Managing Tables on the Flink Web UI ` +- :ref:`Managing Jobs on the Flink Web UI ` +- :ref:`Managing UDFs on the Flink Web UI ` +- :ref:`Interconnecting FlinkServer with External Components ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + overview/index + flinkserver_permissions_management/index + accessing_the_flink_web_ui + creating_an_application_on_the_flink_web_ui + creating_a_cluster_connection_on_the_flink_web_ui + creating_a_data_connection_on_the_flink_web_ui + managing_tables_on_the_flink_web_ui + managing_jobs_on_the_flink_web_ui + managing_udfs_on_the_flink_web_ui/index + interconnecting_flinkserver_with_external_components/index diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/index.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/index.rst new file mode 100644 index 0000000..9e8b449 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/index.rst @@ -0,0 +1,24 @@ +:original_name: mrs_01_24226.html + +.. _mrs_01_24226: + +Interconnecting FlinkServer with External Components +==================================================== + +- :ref:`Interconnecting FlinkServer with ClickHouse ` +- :ref:`Interconnecting FlinkServer with HBase ` +- :ref:`Interconnecting FlinkServer with HDFS ` +- :ref:`Interconnecting FlinkServer with Hive ` +- :ref:`Interconnecting FlinkServer with Hudi ` +- :ref:`Interconnecting FlinkServer with Kafka ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + interconnecting_flinkserver_with_clickhouse + interconnecting_flinkserver_with_hbase + interconnecting_flinkserver_with_hdfs + interconnecting_flinkserver_with_hive + interconnecting_flinkserver_with_hudi + interconnecting_flinkserver_with_kafka diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_clickhouse.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_clickhouse.rst new file mode 100644 index 0000000..dbdc4c1 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_clickhouse.rst @@ -0,0 +1,293 @@ +:original_name: mrs_01_24148.html + +.. _mrs_01_24148: + +Interconnecting FlinkServer with ClickHouse +=========================================== + +Scenario +-------- + +Flink interconnects with the ClickHouseBalancer instance of ClickHouse to read and write data, preventing ClickHouse traffic distribution problems. + +Prerequisites +------------- + +- Services such as ClickHouse, HDFS, Yarn, Flink, and HBase have been installed in the cluster. +- The client has been installed, for example, in **/opt/Bigdata/client**. + +Mapping Between Flink SQL and ClickHouse Data Types +--------------------------------------------------- + +=================== ==================== +Flink SQL Data Type ClickHouse Data Type +=================== ==================== +BOOLEAN UInt8 +TINYINT Int8 +SMALLINT Int16 +INTEGER Int32 +BIGINT Int64 +FLOAT Float32 +DOUBLE Float64 +CHAR String +VARCHAR String +VARBINARY FixedString +DATE Date +TIMESTAMP DateTime +DECIMAL Decimal +=================== ==================== + +Procedure +--------- + +#. Log in to the node where the client is installed as user **root**. + +#. Run the following command to go to the client installation directory: + + **cd /opt/Bigdata/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the current user. The current user must have the permission to create ClickHouse tables. If Kerberos authentication is disabled for the current cluster, skip this step: + + **kinit** *Component service user* + + Example: **kinit** **clickhouseuser** + +#. Connect to the ClickHouse client. For details, see :ref:`Using ClickHouse from Scratch `. + + - Normal mode: + + **clickhouse client --host** *IP address of the ClickHouse instance* **--user** *Username* **--password '**\ *Password*\ **'** **--port** *ClickHouse port number* + + - Security mode: + + **clickhouse client --host** *IP address of the ClickHouse instance* **--user** *Username* **--password '**\ *Password*\ **'--port** *ClickHouse port number* **--secure** **--multiline** + +#. Run the following statements to create a replication table and a distributed table. + + a. Create a replication table **default.test1**. + + .. code-block:: + + CREATE TABLE default.test1 on cluster default_cluster + ( + `pid` Int8, + `uid` UInt8, + `Int_16` Int16, + `Int_32` Int32, + `Int_64` Int64, + `String_x` String, + `String_y` String, + `float_32` Float32, + `float_64` Float64, + `Decimal_x` Decimal32(2), + `Date_x` Date, + `DateTime_x` DateTime + ) + ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/test1','{replica}') + PARTITION BY pid + ORDER BY (pid, DateTime_x); + + b. Create a distributed table **test1_all**. + + .. code-block:: + + CREATE TABLE test1_all ON CLUSTER default_cluster + ( + `pid` Int8, + `uid` UInt8, + `Int_16` Int16, + `Int_32` Int32, + `Int_64` Int64, + `String_x` String, + `String_y` String, + `float_32` Float32, + `float_64` Float64, + `Decimal_x` Decimal32(2), + `Date_x` Date, + `DateTime_x` DateTime + ) + ENGINE = Distributed(default_cluster, default, test1, rand()); + +#. Log in to Manager and choose **Cluster** > **Services** > **Flink**. In the **Basic Information** area, click the link on the right of **Flink WebUI** to access the Flink web UI. + +#. Create a Flink SQL job and set **Task Type** to **Stream job**. For details, see :ref:`Creating a Job `. On the job development page, configure the job parameters as follows and start the job. Select **Enable CheckPoint** in **Running Parameter** and set **Time Interval (ms)** to **60000**. + + - If the current MRS cluster is in security mode, perform the following operations: + + .. code-block:: + + create table kafkasource( + `pid` TINYINT, + `uid` BOOLEAN, + `Int_16` SMALLINT, + `Int_32` INTEGER, + `Int_64` BIGINT, + `String_x` CHAR, + `String_y` VARCHAR(10), + `float_32` FLOAT, + `float_64` DOUBLE, + `Decimal_x` DECIMAL(9,2), + `Date_x` DATE, + `DateTime_x` TIMESTAMP + ) with( + 'connector' = 'kafka', + 'topic' = 'input', + 'properties.bootstrap.servers' = 'IP address of the Kafka broker instance:Kafka port number', + 'properties.group.id' = 'group1', + 'scan.startup.mode' = 'earliest-offset', + 'format' = 'json', + 'properties.sasl.kerberos.service.name' = 'kafka', + 'properties.security.protocol' = 'SASL_PLAINTEXT', + 'properties.kerberos.domain.name' = 'hadoop.System domain name' + ); + CREATE TABLE cksink ( + `pid` TINYINT, + `uid` BOOLEAN, + `Int_16` SMALLINT, + `Int_32` INTEGER, + `Int_64` BIGINT, + `String_x` CHAR, + `String_y` VARCHAR(10), + `float_32` FLOAT, + `float_64` DOUBLE, + `Decimal_x` DECIMAL(9,2), + `Date_x` DATE, + `DateTime_x` TIMESTAMP + ) WITH ( + 'connector' = 'jdbc', + 'url' = 'jdbc:clickhouse://ClickHouseBalancer instance IP address:21422/default?ssl=true&sslmode=none', + 'username' = 'ClickHouse user. For details, see the note below.', + 'password' = 'ClickHouse user password. For details, see the note below.', + 'table-name' = 'test1_all', + 'driver' = 'ru.yandex.clickhouse.ClickHouseDriver', + 'sink.buffer-flush.max-rows' = '0', + 'sink.buffer-flush.interval' = '60s' + ); + Insert into cksink + select + * + from + kafkasource; + + - If the current MRS cluster is in normal mode, perform the following operations: + + .. code-block:: + + create table kafkasource( + `pid` TINYINT, + `uid` BOOLEAN, + `Int_16` SMALLINT, + `Int_32` INTEGER, + `Int_64` BIGINT, + `String_x` CHAR, + `String_y` VARCHAR(10), + `float_32` FLOAT, + `float_64` DOUBLE, + `Decimal_x` DECIMAL(9,2), + `Date_x` DATE, + `DateTime_x` TIMESTAMP + ) with( + 'connector' = 'kafka', + 'topic' = 'kinput', + 'properties.bootstrap.servers' = 'IP address of the Kafka broker instance:Kafka port number', + 'properties.group.id' = 'kafka_test', + 'scan.startup.mode' = 'earliest-offset', + 'format' = 'json' + ); + CREATE TABLE cksink ( + `pid` TINYINT, + `uid` BOOLEAN, + `Int_16` SMALLINT, + `Int_32` INTEGER, + `Int_64` BIGINT, + `String_x` CHAR, + `String_y` VARCHAR(10), + `float_32` FLOAT, + `float_64` DOUBLE, + `Decimal_x` DECIMAL(9,2), + `Date_x` DATE, + `DateTime_x` TIMESTAMP + ) WITH ( + 'connector' = 'jdbc', + 'url' = 'jdbc:clickhouse://ClickHouseBalancer instance IP address:21425/default', + 'table-name' = 'test1_all', + 'driver' = 'ru.yandex.clickhouse.ClickHouseDriver', + 'sink.buffer-flush.max-rows' = '0', + 'sink.buffer-flush.interval' = '60s' + ); + Insert into cksink + select + * + from + kafkasource; + + .. note:: + + - If an MRS cluster is in the security mode, the user in the **cksink** table must have related permissions on the ClickHouse tables. For details, see :ref:`ClickHouse User and Permission Management `. + + - Kafka port number + + - In security mode, the port number is the value of **sasl.port** (**21007** by default). + + - In non-security mode, the port is the value of **port** (**9092** by default). If the port number is set to **9092**, set **allow.everyone.if.no.acl.found** to **true**. The procedure is as follows: + + Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Kafka**. On the displayed page, click **Configurations** and then **All Configurations**, search for **allow.everyone.if.no.acl.found**, set its value to **true**, and click **Save**. + + - **21422**: HTTPS port number of the ClickHouseBalancer instance IP address. + + - **21425**: HTTP port number of the ClickHouseBalancer instance IP address. + + - Parameters for batch write: Flink stores data in the memory and then flushes the data to the database table when the trigger condition is met. The configurations are as follows: + + **sink.buffer-flush.max-rows**: Number of rows written to ClickHouse. The default value is **100**. + + **sink.buffer-flush.interval**: Interval for batch write. The default value is **1s**. + + If either of the two conditions is met, a sink operation is triggered. That is, data will be flushed to the database table. + + - Scenario 1: sink every 60s + + 'sink.buffer-flush.max-rows' = '0', + + 'sink.buffer-flush.interval' = '60s' + + - Scenario 2: sink every 100 rows + + 'sink.buffer-flush.max-rows' = '100', + + 'sink.buffer-flush.interval' = '0s' + + - Scenario 3: no sink + + 'sink.buffer-flush.max-rows' = '0', + + 'sink.buffer-flush.interval' = '0s' + +#. On the job management page, check whether the job status is **Running**. + +#. Execute the following script to write data to Kafka. For details, see :ref:`Managing Messages in Kafka Topics `. + + **sh kafka-console-producer.sh --broker-list** *IP address of the node where the Kafka instance locates:Kafka port number* **--**\ **topic**\ *Topic name* **--producer.config ../config/producer.properties** + + For example, if the topic name is **kinput**, the script is **sh kafka-console-producer.sh --broker-list** *IP address of the node where the Kafka instance locates:Kafka port number* **--topic kinput** **--producer.config ../config/producer.properties**. + + Enter the message content. + + .. code-block:: + + {"pid": "3","uid":false,"Int_16": "6533","Int_32": "429496294","Int_64": "1844674407370955614","String_x": "abc1","String_y": "abc1defghi","float_32": "0.1234","float_64": "95.1","Decimal_x": "0.451236414","Date_x": "2021-05-29","DateTime_x": "2021-05-21 10:05:10"}, + {"pid": "4","uid":false,"Int_16": "6533","Int_32": "429496294","Int_64": "1844674407370955614","String_x": "abc1","String_y": "abc1defghi","float_32": "0.1234","float_64": "95.1","Decimal_x": "0.4512314","Date_x": "2021-05-29","DateTime_x": "2021-05-21 10:05:10"} + + Press **Enter** to send the message. + +#. Interconnect with ClickHouse to query the table data. + + **clickhouse client --host** *IP address of the ClickHouse instance* **--user** *Username* **--password '**\ *Password*\ **'--port** *ClickHouse port number* **--secure** **--multiline** + + Run the following command to check whether data is written to a specified ClickHouse table, for example, **test1_all**. + + **select \* from test1_all;** diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hbase.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hbase.rst new file mode 100644 index 0000000..fe6c219 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hbase.rst @@ -0,0 +1,172 @@ +:original_name: mrs_01_24120.html + +.. _mrs_01_24120: + +Interconnecting FlinkServer with HBase +====================================== + +Scenario +-------- + +FlinkServer can be interconnected with HBase. The details are as follows: + +- It can be interconnected with dimension tables and sink tables. +- When HBase and Flink are in the same cluster or clusters with mutual trust, FlinkServer can be interconnected with HBase. +- If HBase and Flink are in different clusters without mutual trust, Flink in a normal cluster can be interconnected with HBase in a normal cluster. + +Prerequisites +------------- + +- The HDFS, Yarn, Flink, and HBase services have been installed in a cluster. +- The client that contains the HBase service has been installed, for example, in the **/opt/Bigdata/client** directory. + +Procedure +--------- + +#. .. _mrs_01_24120__en-us_topic_0000001173471334_li197840151912: + + Log in to the node where the client is installed as the client installation user and copy all configuration files in the **/opt/Bigdata/client/HBase/hbase/conf/** directory of HBase to an empty directory of all nodes where FlinkServer is deployed, for example, **/tmp/client/HBase/hbase/conf/**. + + Change the owner of the configuration file directory and its upper-layer directory on the FlinkServer node to **omm**. + + **chown omm: /tmp/client/HBase/hbase/conf/ -R** + + .. note:: + + - FlinkServer nodes: + + Log in to Manager, choose **Cluster** > **Services** > **Flink** > **Instance**, and check the **Service IP Address** of FlinkServer. + + - If the node where a FlinkServer instance is located is the node where the HBase client is installed, skip this step on this node. + +#. Log in to Manager, choose **Cluster** > **Services** > **Flink** > **Configurations** > **All Configurations**, search for the **HBASE_CONF_DIR** parameter, and enter the FlinkServer directory (for example, **/tmp/client/HBase/hbase/conf/**) to which the HBase configuration files are copied in :ref:`1 ` in **Value**. + + .. note:: + + If the node where a FlinkServer instance is located is the node where the HBase client is installed, enter the **/opt/Bigdata/client/HBase/hbase/conf/** directory of HBase in **Value** of the **HBASE_CONF_DIR** parameter. + +#. After the parameters are configured, click **Save**. After confirming the modification, click **OK**. + +#. Click **Instance**, select all FlinkServer instances, choose **More** > **Restart Instance**, enter the password, and click **OK** to restart the instances. + +#. Log in to Manager and choose **Cluster** > **Services** > **Flink**. In the **Basic Information** area, click the link on the right of **Flink WebUI** to access the Flink web UI. + +#. Create a Flink SQL job and set Task Type to Stream job. For details, see :ref:`Creating a Job `. On the job development page, configure the job parameters as follows and start the job. + + Select **Enable CheckPoint** in **Running Parameter** and set **Time Interval (ms)** to **60000**. + + The following example shows how to create a Flink SQL job: + + .. code-block:: + + CREATE TABLE ksource1 ( + user_id STRING, + item_id STRING, + proctime as PROCTIME() + ) WITH ( + 'connector' = 'kafka', + 'topic' = 'ksource1', + 'properties.group.id' = 'group1', + 'properties.bootstrap.servers' = 'IP address of the Kafka broker instance 1:Kafka port number,IP address of the Kafka broker instance 2:Kafka port number', + 'format' = 'json', + 'properties.sasl.kerberos.service.name' = 'kafka' + ); + + CREATE TABLE hsink1 ( + rowkey STRING, + f1 ROW < item_id STRING >, + PRIMARY KEY (rowkey) NOT ENFORCED + ) WITH ( + 'connector' = 'hbase-2.2', + 'table-name' = 'dim_province', + 'zookeeper.quorum' = 'IP address of the ZooKeeper quorumpeer instance 1:ZooKeeper port number'IP address of the ZooKeeper quorumpeer instance 2:ZooKeeper port number' + ); + + INSERT INTO + hsink1 + SELECT + user_id as rowkey, + ROW(item_id) as f1 + FROM + ksource1; + + .. note:: + + - Kafka port number + + - In security mode, the port number is the value of **sasl.port** (**21007** by default). + + - In non-security mode, the port is the value of **port** (**9092** by default). If the port number is set to **9092**, set **allow.everyone.if.no.acl.found** to **true**. The procedure is as follows: + + Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Kafka**. On the displayed page, click **Configurations** and then **All Configurations**, search for **allow.everyone.if.no.acl.found**, set its value to **true**, and click **Save**. + + - IP address of the ZooKeeper quorumpeer instance + + To obtain IP addresses of all ZooKeeper quorumpeer instances, log in to FusionInsight Manager and choose **Cluster** > **Services** > **ZooKeeper**. On the displayed page, click **Instance** and view the IP addresses of all the hosts where the quorumpeer instances locate. + + - Port number of the ZooKeeper client + + Log in to FusionInsight Manager and choose **Cluster** > **Service** > **ZooKeeper**. On the displayed page, click **Configurations** and check the value of **clientPort**. The default value is **24002**. + +#. On the job management page, check whether the job status is **Running**. + +#. Execute the following script to write data to Kafka. For details, see :ref:`Managing Messages in Kafka Topics `. + + **sh kafka-console-producer.sh --broker-list** *IP address of the node where the Kafka instance locates:Kafka port number* **--topic** **Topic name** + + For example, if the topic name is **ksource1**, the script is **sh kafka-console-producer.sh --broker-list** *IP address of the node where the Kafka instance locates*:*Kafka port number* **--topic ksource1**. + + Enter the message content. + + .. code-block:: + + {"user_id": "3","item_id":"333333"}, + {"user_id": "4","item_id":"44444444"} + + Press **Enter** to send the message. + +#. Log in to the HBase client and view the table data. For details, see :ref:`Using an HBase Client `. + + **hbase shel** + + **scan 'dim_province'** + +Submitting a Job Using the Application +-------------------------------------- + +- If the Flink run mode is used, you are advised to use the **export HBASE_CONF_DIR=** *HBase configuration directory*, for example, **export HBASE_CONF_DIR=/opt/hbaseconf**. +- If the Flink run-application mode is used, you can use either of the following methods to submit jobs: + + - (Recommended) Add the following configurations to a table creation statement. + + +---------------------------------------------------------+------------------------------------------------------------------+ + | Parameter | Description | + +=========================================================+==================================================================+ + | 'properties.hbase.rpc.protection' = 'authentication' | This parameter must be consistent with that on the HBase server. | + +---------------------------------------------------------+------------------------------------------------------------------+ + | 'properties.hbase.security.authorization' = 'true' | Authentication is enabled. | + +---------------------------------------------------------+------------------------------------------------------------------+ + | 'properties.hbase.security.authentication' = 'kerberos' | Kerberos authentication is enabled. | + +---------------------------------------------------------+------------------------------------------------------------------+ + + Example: + + .. code-block:: + + CREATE TABLE hsink1 ( + rowkey STRING, + f1 ROW < q1 STRING >, + PRIMARY KEY (rowkey) NOT ENFORCED + ) WITH ( + 'connector' = 'hbase-2.2', + 'table-name' = 'cc', + 'zookeeper.quorum' = 'x.x.x.x:24002', + 'properties.hbase.rpc.protection' = 'authentication', + 'properties.zookeeper.znode.parent' = '/hbase', + 'properties.hbase.security.authorization' = 'true', + 'properties.hbase.security.authentication' = 'kerberos' + ); + + - Add the HBase configuration to YarnShip. + + Example: Dyarn.ship-files=/opt/hbaseconf diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hdfs.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hdfs.rst new file mode 100644 index 0000000..ef89755 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hdfs.rst @@ -0,0 +1,196 @@ +:original_name: mrs_01_24247.html + +.. _mrs_01_24247: + +Interconnecting FlinkServer with HDFS +===================================== + +Scenario +-------- + +This section describes the data definition language (DDL) of HDFS as a sink table, as well as the WITH parameters and example code for creating a sink table, and provides guidance on how to perform operations on the FlinkServer job management page. + +If your Kafka cluster is in security mode, the following example SQL statements can be used. + +Prerequisites +------------- + +- The HDFS, Yarn, and Flink services have been installed in a cluster. +- The client that contains the HDFS service has been installed, for example, in the **/opt/Bigdata/client** directory. +- You have created a user with **FlinkServer Admin Privilege**, for example, **flink_admin**, to access the Flink web UI. For details, see :ref:`Authentication Based on Users and Roles `. + +Procedure +--------- + +#. Log in to Manager as user **flink_admin** and choose **Cluster** > **Services** > **Flink**. In the **Basic Information** area, click the link on the right of **Flink WebUI** to access the Flink web UI. + +#. Create a Flink SQL job by referring to :ref:`Creating a Job `. On the job development page, configure the job parameters as follows and start the job. + + Select **Enable CheckPoint** in **Running Parameter** and set **Time Interval (ms)** to **60000**. + + .. code-block:: + + CREATE TABLE kafka_table ( + user_id STRING, + order_amount DOUBLE, + log_ts TIMESTAMP(3), + WATERMARK FOR log_ts AS log_ts - INTERVAL '5' SECOND + ) WITH ( + 'connector' = 'kafka', + 'topic' = 'user_source', + 'properties.bootstrap.servers' = 'IP address of the Kafka broker instance:Kafka port number', + 'properties.group.id' = 'testGroup', + 'scan.startup.mode' = 'latest-offset', + 'format' = 'csv', + --Ignore the CSV data that fails to be parsed. + 'csv.ignore-parse-errors' = 'true' ,--If the data is in JSON format, set 'json.ignore-parse-errors' to true. + 'properties.sasl.kerberos.service.name' = 'kafka', + 'properties.security.protocol' = 'SASL_PLAINTEXT', + 'properties.kerberos.domain.name' = 'hadoop.System domain name' + + ); + + CREATE TABLE fs_table ( + user_id STRING, + order_amount DOUBLE, + dt STRING, + `hour` STRING + ) PARTITIONED BY (dt, `hour`) WITH ( --Date-specific file partitioning + 'connector'='filesystem', + 'path'='hdfs:///sql/parquet', + 'format'='parquet', + 'sink.partition-commit.delay'='1 h', + 'sink.partition-commit.policy.kind'='success-file' + ); + -- streaming sql, insert into file system table + INSERT INTO fs_table SELECT user_id, order_amount, DATE_FORMAT(log_ts, 'yyyy-MM-dd'), DATE_FORMAT(log_ts, 'HH') FROM kafka_table; + + .. note:: + + Kafka port number + + - In security mode, the port number is the value of **sasl.port** (**21007** by default). + + - In non-security mode, the port is the value of **port** (**9092** by default). If the port number is set to **9092**, set **allow.everyone.if.no.acl.found** to **true**. The procedure is as follows: + + Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Kafka**. On the displayed page, click **Configurations** and then **All Configurations**, search for **allow.everyone.if.no.acl.found**, set its value to **true**, and click **Save**. + +#. On the job management page, check whether the job status is **Running**. + +#. Execute the following commands to view the topic and write data to Kafka. For details, see :ref:`Managing Messages in Kafka Topics `. + + **./kafka-topics.sh --list --zookeeper** *IP address of the ZooKeeper quorumpeer instance*:*ZooKeeper port number*\ **/kafka** + + **sh kafka-console-producer.sh --broker-list** *IP address of the node where the Kafka instance locates:Kafka port number* **--topic** *Topic name* --**producer.config** *Client directory*/**Kafka/kafka/config/producer.properties** + + For example, if the topic name is **user_source**, the script is **sh kafka-console-producer.sh --broker-list** *IP address of the node where the Kafka instance locates:Kafka port number* **--topic user_source** --**producer.config** **/opt/Bigdata/client/Kafka/kafka/config/producer.properties**. + + Enter the message content. + + .. code-block:: + + 3,3333,"2021-09-10 14:00" + 4,4444,"2021-09-10 14:01" + + Press **Enter** to send the message. + + .. note:: + + - IP address of the ZooKeeper quorumpeer instance + + To obtain IP addresses of all ZooKeeper quorumpeer instances, log in to FusionInsight Manager and choose **Cluster** > **Services** > **ZooKeeper**. On the displayed page, click **Instance** and view the IP addresses of all the hosts where the quorumpeer instances locate. + + - Port number of the ZooKeeper client + + Log in to FusionInsight Manager and choose **Cluster** > **Service** > **ZooKeeper**. On the displayed page, click **Configurations** and check the value of **clientPort**. The default value is **24002**. + +#. Run the following command to check whether data is written from the HDFS directory to the sink table: + + **hdfs dfs -ls -R /sql/parquet** + +Interconnecting Flink with HDFS Partitions +------------------------------------------ + +- Customized partitioning + + Flink's file system supports partitions in the standard Hive format. You do not need to register partitions with a table catalog. Partitions are inferred based on the directory structure. + + For example, a table that is partitioned based on the following directory is inferred to contain datetime and hour partitions. + + .. code-block:: + + path + └── datetime=2021-09-03 + └── hour=11 + ├── part-0.parquet + ├── part-1.parquet + └── hour=12 + ├── part-0.parquet + └── datetime=2021-09-24 + └── hour=6 + ├── part-0.parquet + +- Rolling policy of partition files + + Data in the partition directories is split into part files. Each partition contains at least one part file, which is used to receive the data written by the subtask of the sink. + + The following parameters describe the rolling policies of partition files. + + +---------------------------------------+---------------+------------+---------------------------------------------------------------------------+ + | Parameter | Default Value | Type | Description | + +=======================================+===============+============+===========================================================================+ + | sink.rolling-policy.file-size | 128 MB | MemorySize | Maximum size of a partition file before it is rolled. | + +---------------------------------------+---------------+------------+---------------------------------------------------------------------------+ + | sink.rolling-policy.rollover-interval | 30 minutes | Duration | Maximum duration that a partition file can stay open before it is rolled. | + +---------------------------------------+---------------+------------+---------------------------------------------------------------------------+ + | sink.rolling-policy.check-interval | 1 minute | Duration | Interval for checking time-based rolling policies. | + +---------------------------------------+---------------+------------+---------------------------------------------------------------------------+ + +- File merging + + File compression is supported, allowing applications to have a shorter checkpoint interval without generating a large number of files. + + .. note:: + + Only files in a single checkpoint are compressed. That is, the number of generated files is at least the same as the number of checkpoints. Files are invisible before merged. They are visible after both the checkpoint and compression are complete. If file compression takes too much time, the checkpoint will be prolonged. + + +----------------------+---------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Type | Description | + +======================+===============+============+===========================================================================================================================================================================================================================================+ + | auto-compaction | false | Boolean | Whether to enable automatic compression. Data will be written to temporary files. After a checkpoint is complete, the temporary files generated by the checkpoint are compressed. These temporary files are invisible before compression. | + +----------------------+---------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | compaction.file-size | none | MemorySize | Size of the target file to be compressed. The default value is the size of the file to be rolled. | + +----------------------+---------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- Partition commit + + After a file is written to a partition, for example, a partition is added to Hive metastore (HMS) or a **\_SUCCESS** file is written to a directory, the downstream application needs to be notified. Triggers and policies are used to commit partition files. + + - Trigger parameters for committing partition files + + +-------------------------------+-----------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Type | Description | + +===============================+=================+=================+===========================================================================================================================================================================================================================================================================+ + | sink.partition-commit.trigger | process-time | String | - process-time: System time of the compute node. It does not need to extract the partition time or generate watermarks. If the current system time exceeds the system time generated when a partition is created plus the delay time, the partition should be submitted. | + | | | | - partition-time: Time extracted from the partition. Watermarks are required. If the time for generating watermarks exceeds the time extracted from a partition plus the delay time, the partition should be submitted. | + +-------------------------------+-----------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | sink.partition-commit.delay | 0 s | Duration | Partitions will not be committed before the delay time. If it is a daily partition, the value is **1 d**. If it is an hourly one, the value is **1 h**. | + +-------------------------------+-----------------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + - Policy parameters for committing partition files + + +-----------------------------------------+-----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Type | Description | + +=========================================+=================+=================+============================================================================================================================================================================+ + | sink.partition-commit.policy.kind | ``-`` | String | Policy for committing partitions: | + | | | | | + | | | | - **metastore**: used to add partitions to metastore. Only Hive tables support the metastore policy. The file system manages partitions based on the directory structure. | + | | | | - **success-file**: used to add **success-file** files to a directory. | + | | | | - The two policies can be configured at the same time, that is, **'sink.partition-commit.policy.kind'='metastore,success-file'**. | + +-----------------------------------------+-----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | sink.partition-commit.policy.class | ``-`` | String | Class that implements partition commit policy interfaces. | + | | | | | + | | | | This parameter takes effect only in the customized submission policies. | + +-----------------------------------------+-----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | sink.partition-commit.success-file.name | \_SUCCESS | String | File name of the success-file partition commit policy. The default value is **\_SUCCESS**. | + +-----------------------------------------+-----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hive.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hive.rst new file mode 100644 index 0000000..44df553 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hive.rst @@ -0,0 +1,190 @@ +:original_name: mrs_01_24179.html + +.. _mrs_01_24179: + +Interconnecting FlinkServer with Hive +===================================== + +Scenario +-------- + +Currently, FlinkServer interconnects with Hive MetaStore. Therefore, the MetaStore function must be enabled for Hive. Hive can be used as source, sink, and dimension tables. + +If your Kafka cluster is in security mode, the following example SQL statements can be used. + +Prerequisites +------------- + +- Services such as HDFS, Yarn, Kafka, Flink, and Hive have been installed in the cluster. +- The client that contains the Hive service has been installed, for example, in the **/opt/Bigdata/client** directory. +- Flink 1.12.2 or later and Hive 3.1.0 or later are supported. +- You have created a user with **FlinkServer Admin Privilege**, for example, **flink_admin**, to access the Flink web UI. For details, see :ref:`Authentication Based on Users and Roles `. +- You have obtained the client configuration file and credential of the user for accessing the Flink web UI. For details, see "Note" in :ref:`Creating a Cluster Connection `. + +Procedure +--------- + +The following uses the process of interconnecting a Kafka mapping table to Hive as an example. + +#. Log in to the Flink web UI as user **flink_admin**. For details, see :ref:`Accessing the Flink Web UI `. + +#. .. _mrs_01_24179__en-us_topic_0000001173470662_li159021532193916: + + Create a cluster connection, for example, **flink_hive**. + + a. Choose **System Management** > **Cluster Connection Management**. The **Cluster Connection Management** page is displayed. + + b. Click **Create Cluster Connection**. On the displayed page, enter information by referring to :ref:`Table 1 ` and click **Test**. After the test is successful, click **OK**. + + .. _mrs_01_24179__en-us_topic_0000001173470662_table134890201518: + + .. table:: **Table 1** Parameters for creating a cluster connection + + +-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ + | Parameter | Description | Example Value | + +=========================+============================================================================================================================================================================================+====================================+ + | Cluster Connection Name | Name of the cluster connection, which can contain a maximum of 100 characters. Only letters, digits, and underscores (_) are allowed. | flink_hive | + +-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ + | Description | Description of the cluster connection name. | ``-`` | + +-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ + | Version | Select a cluster version. | MRS 3 | + +-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ + | Secure Version | - If the secure version is used, select **Yes** for a security cluster. Enter the username and upload the user credential. | Yes | + | | - If not, select **No**. | | + +-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ + | Username | The user must have the minimum permissions for accessing services in the cluster. The name can contain a maximum of 100 characters. Only letters, digits, and underscores (_) are allowed. | flink_admin | + | | | | + | | This parameter is available only when **Secure Version** is set to **Yes**. | | + +-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ + | Client Profile | Client profile of the cluster, in TAR format. | ``-`` | + +-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ + | User Credential | User authentication credential in FusionInsight Manager in TAR format. | User credential of **flink_admin** | + | | | | + | | This parameter is available only when **Secure Version** is set to **Yes**. | | + | | | | + | | Files can be uploaded only after the username is entered. | | + +-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------+ + +#. Create a Flink SQL job, for example, **flinktest1**. + + a. Click **Job Management**. The job management page is displayed. + + b. Click **Create Job**. On the displayed job creation page, set parameters by referring to :ref:`Table 2 ` and click **OK**. The job development page is displayed. + + .. _mrs_01_24179__en-us_topic_0000001173470662_table25451917135812: + + .. table:: **Table 2** Parameters for creating a job + + +-------------+----------------------------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Example Value | + +=============+================================================================================================================+===============+ + | Type | Job type, which can be **Flink SQL** or **Flink Jar**. | Flink SQL | + +-------------+----------------------------------------------------------------------------------------------------------------+---------------+ + | Name | Job name, which can contain a maximum of 64 characters. Only letters, digits, and underscores (_) are allowed. | flinktest1 | + +-------------+----------------------------------------------------------------------------------------------------------------+---------------+ + | Task Type | Type of the job data source, which can be a stream job or a batch job. | Stream job | + +-------------+----------------------------------------------------------------------------------------------------------------+---------------+ + | Description | Job description, which can contain a maximum of 100 characters. | ``-`` | + +-------------+----------------------------------------------------------------------------------------------------------------+---------------+ + +#. On the job development page, enter the following statements and click **Check Semantic** to check the input content. + + .. code-block:: + + CREATE TABLE test_kafka ( + user_id varchar, + item_id varchar, + cat_id varchar, + zw_test timestamp + ) WITH ( + 'properties.bootstrap.servers' = 'IP address of the Kafka broker instance:Kafka port number', + 'format' = 'json', + 'topic' = 'zw_tset_kafka', + 'connector' = 'kafka', + 'scan.startup.mode' = 'latest-offset', + 'properties.sasl.kerberos.service.name' = 'kafka', + 'properties.security.protocol' = 'SASL_PLAINTEXT', + 'properties.kerberos.domain.name' = 'hadoop.System domain name' + + ); + CREATE CATALOG myhive WITH ( + 'type' = 'hive', + 'hive-version' = '3.1.0', + 'default-database' = 'default', + 'cluster.name' = 'flink_hive' + ); + use catalog myhive; + set table.sql-dialect = hive;create table user_behavior_hive_tbl_no_partition ( + user_id STRING, + item_id STRING, + cat_id STRING, + ts timestamp + ) PARTITIONED BY (dy STRING, ho STRING, mi STRING) stored as textfile TBLPROPERTIES ( + 'partition.time-extractor.timestamp-pattern' = '$dy $ho:$mi:00', + 'sink.partition-commit.trigger' = 'process-time', + 'sink.partition-commit.delay' = '0S', + 'sink.partition-commit.policy.kind' = 'metastore,success-file' + ); + INSERT into + user_behavior_hive_tbl_no_partition + SELECT + user_id, + item_id, + cat_id, + zw_test, + DATE_FORMAT(zw_test, 'yyyy-MM-dd'), + DATE_FORMAT(zw_test, 'HH'), + DATE_FORMAT(zw_test, 'mm') + FROM + default_catalog.default_database.test_kafka; + + .. note:: + + - Kafka port number + + - In security mode, the port number is the value of **sasl.port** (**21007** by default). + + - In non-security mode, the port is the value of **port** (**9092** by default). If the port number is set to **9092**, set **allow.everyone.if.no.acl.found** to **true**. The procedure is as follows: + + Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Kafka**. On the displayed page, click **Configurations** and then **All Configurations**, search for **allow.everyone.if.no.acl.found**, set its value to **true**, and click **Save**. + + - The value of **'cluster.name'** is the name of the cluster connection created in :ref:`2 `. + +#. After the job is developed, select **Enable CheckPoint** in **Running Parameter** and set **Time Interval (ms)** to **60000**. + +#. Click **Submit** in the upper left corner to submit the job. + +#. After the job is successfully executed, choose **More** > **Job Monitoring** to view the job running details. + +#. Execute the following commands to view the topic and write data to Kafka. For details, see :ref:`Managing Messages in Kafka Topics `. + + **./kafka-topics.sh --list --zookeeper** *IP address of the ZooKeeper quorumpeer instance*:*ZooKeeper port number*\ **/kafka** + + **sh kafka-console-producer.sh --broker-list** *IP address of the node where the Kafka instance locates:Kafka port number* **--topic** *Topic name* --**producer.config** *Client directory*/**Kafka/kafka/config/producer.properties** + + For example, if the topic name is **zw_tset_kafka**, the script is **sh kafka-console-producer.sh --broker-list** *IP address of the node where the Kafka instance locates:Kafka port number* **--topic zw_tset_kafka** --**producer.config** **/opt/Bigdata/client/Kafka/kafka/config/producer.properties**. + + Enter the message content. + + .. code-block:: + + {"user_id": "3","item_id":"333333","cat_id":"cat333","zw_test":"2021-09-08 09:08:01"} + {"user_id": "4","item_id":"444444","cat_id":"cat444","zw_test":"2021-09-08 09:08:01"} + + Press **Enter** to send the message. + + .. note:: + + - IP address of the ZooKeeper quorumpeer instance + + To obtain IP addresses of all ZooKeeper quorumpeer instances, log in to FusionInsight Manager and choose **Cluster** > **Services** > **ZooKeeper**. On the displayed page, click **Instance** and view the IP addresses of all the hosts where the quorumpeer instances locate. + + - Port number of the ZooKeeper client + + Log in to FusionInsight Manager and choose **Cluster** > **Service** > **ZooKeeper**. On the displayed page, click **Configurations** and check the value of **clientPort**. The default value is **24002**. + +#. Run the following command to check whether data is written from the Hive table to the sink table: + + **beeline** + + **select \* from user_behavior_hive_tbl_no_partition;** diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hudi.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hudi.rst new file mode 100644 index 0000000..0fb1c0d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hudi.rst @@ -0,0 +1,185 @@ +:original_name: mrs_01_24180.html + +.. _mrs_01_24180: + +Interconnecting FlinkServer with Hudi +===================================== + +Scenario +-------- + +This section describes how to interconnect FlinkServer with Hudi through Flink SQL jobs. + +Prerequisites +------------- + +- The HDFS, Yarn, Flink, and Hudi services have been installed in a cluster. +- The client that contains the Hudi service has been installed, for example, in the **/opt/Bigdata/client** directory. +- Flink 1.12.2 or later and Hudi 0.9.0 or later are required. +- You have created a user with **FlinkServer Admin Privilege**, for example, **flink_admin**, to access the Flink web UI. For details, see :ref:`Authentication Based on Users and Roles `. + +Flink Support for Read and Write Operations on Hudi Tables +---------------------------------------------------------- + +:ref:`Table 1 ` lists the read and write operations supported by Flink on Hudi COW and MOR tables. + +.. _mrs_01_24180__en-us_topic_0000001219149723_table1766417313461: + +.. table:: **Table 1** Flink support for read and write operations on Hudi tables + + ============ ========= ========= + Flink SQL COW table MOR table + ============ ========= ========= + Batch write Supported Supported + Batch read Supported Supported + Stream write Supported Supported + Stream read Supported Supported + ============ ========= ========= + +.. note:: + + Currently, Flink SQL allows you to read data from Hudi tables only in snapshot mode and read optimized mode. + +Procedure +--------- + +#. Log in to Manager as user **flink_admin** and choose **Cluster** > **Services** > **Flink**. In the **Basic Information** area, click the link on the right of **Flink WebUI** to access the Flink web UI. + +#. Create a Flink SQL job by referring to :ref:`Creating a Job `. On the job development page, configure the job parameters as follows and start the job. + + Select **Enable CheckPoint** in **Running Parameter** and set **Time Interval (ms)** to **60000**. + + .. note:: + + - CheckPoint should be enabled on the Flink web UI because data is written to a Hudi table only when a Flink SQL job triggers CheckPoint. Adjust the CheckPoint interval based on service requirements. You are advised to set the interval to a large number. + - If the CheckPoint interval is too short, job exceptions may occur due to untimely data updates. It is recommended that the CheckPoint interval be configured at the minute level. + - Asynchronous compaction is required when a Flink SQL job writes an MOR table. For details about the parameter for controlling the compaction interval, visit Hudi official website https://hudi.apache.org/docs/configurations.html. + + - The following shows a Flink SQL job writing data to an MOR table in stream mode. Only the Kafka JSON format is supported. + + .. code-block:: + + CREATE TABLE stream_mor( + uuid VARCHAR(20), + name VARCHAR(10), + age INT, + ts INT, + `p` VARCHAR(20) + ) PARTITIONED BY (`p`) WITH ( + 'connector' = 'hudi', + 'path' = 'hdfs://hacluster/tmp/hudi/stream_mor', + 'table.type' = 'MERGE_ON_READ' + ); + + CREATE TABLE kafka( + uuid VARCHAR(20), + name VARCHAR(10), + age INT, + ts INT, + `p` VARCHAR(20) + ) WITH ( + 'connector' = 'kafka', + 'topic' = 'writehudi', + 'properties.bootstrap.servers' = 'IP address of the Kafka broker instance:Kafka port number', + 'properties.group.id' = 'testGroup1', + 'scan.startup.mode' = 'latest-offset', + 'format' = 'json' + ); + + insert into + stream_mor + select + * + from + kafka; + + - The following shows a Flink SQL job writing data to a COW table in stream mode: + + .. code-block:: + + CREATE TABLE stream_write_cow( + uuid VARCHAR(20), + name VARCHAR(10), + age INT, + ts INT, + `p` VARCHAR(20) + ) PARTITIONED BY (`p`) WITH ( + 'connector' = 'hudi', + 'path' = 'hdfs://hacluster/tmp/hudi/stream_cow' + ); + + CREATE TABLE kafka( + uuid VARCHAR(20), + name VARCHAR(10), + age INT, + ts INT, + `p` VARCHAR(20) + ) WITH ( + 'connector' = 'kafka', + 'topic' = 'writehudi', + 'properties.bootstrap.servers' = 'IP address of the Kafka broker instance:Kafka port number', + 'properties.group.id' = 'testGroup1', + 'scan.startup.mode' = 'latest-offset', + 'format' = 'json' + ); + + insert into + stream_write_cow + select + * + from + kafka; + + - The following shows a Flink SQL job reading an MOR table. + + .. code-block:: + + CREATE TABLE hudi_read_spark_mor( + uuid VARCHAR(20), + name VARCHAR(10), + age INT, + ts INT, + `p` VARCHAR(20) + ) PARTITIONED BY (`p`) WITH ( + 'connector' = 'hudi', + 'path' = 'hdfs://hacluster/tmp/default/tb_hudimor', + 'table.type' = 'MERGE_ON_READ' + ); + + CREATE TABLE kafka( + uuid VARCHAR(20), + name VARCHAR(10), + age INT, + ts timestamp(6)INT, + `p` VARCHAR(20) + ) WITH ( + 'connector' = 'kafka', + 'topic' = 'writehudi', + 'properties.bootstrap.servers' = 'IP address of the Kafka broker instance:Kafka port number', + 'properties.group.id' = 'testGroup1', + 'scan.startup.mode' = 'latest-offset', + 'format' = 'json' + ); + + insert into + hudi_read_spark_mor + select + * + from + kafka; + + .. note:: + + Kafka port number + + - In security mode, the port number is the value of **sasl.port** (**21007** by default). + + - In non-security mode, the port is the value of **port** (**9092** by default). If the port number is set to **9092**, set **allow.everyone.if.no.acl.found** to **true**. The procedure is as follows: + + Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Kafka**. On the displayed page, click **Configurations** and then **All Configurations**, search for **allow.everyone.if.no.acl.found**, set its value to **true**, and click **Save**. + +#. After data is written to the Hudi table by a Flink SQL job and is read by Spark and Hive, use **run_hive_sync_tool.sh** to synchronize the data in the Hudi table to Hive. For details about the synchronization method, see :ref:`Synchronizing Hudi Table Data to Hive `. + + .. important:: + + Ensure that no partition is added before the synchronization. After the synchronization, new partitions cannot be read. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_kafka.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_kafka.rst new file mode 100644 index 0000000..fbb7e9f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_kafka.rst @@ -0,0 +1,139 @@ +:original_name: mrs_01_24248.html + +.. _mrs_01_24248: + +Interconnecting FlinkServer with Kafka +====================================== + +Scenario +-------- + +This section describes the data definition language (DDL) of Kafka as a source or sink table, as well as the WITH parameters and example code for creating a table, and provides guidance on how to perform operations on the FlinkServer job management page. + +If your Kafka cluster is in security mode, the following example SQL statements can be used. + +Prerequisites +------------- + +- The HDFS, Yarn, Kafka, and Flink services have been installed in a cluster. +- The client that contains the Kafka service has been installed, for example, in the **/opt/Bigdata/client** directory. +- You have created a user with **FlinkServer Admin Privilege**, for example, **flink_admin**, to access the Flink web UI. For details, see :ref:`Authentication Based on Users and Roles `. + +Procedure +--------- + +#. Log in to Manager as user **flink_admin** and choose **Cluster** > **Services** > **Flink**. In the **Basic Information** area, click the link on the right of **Flink WebUI** to access the Flink web UI. + +#. Create a Flink SQL job by referring to :ref:`Creating a Job `. On the job development page, configure the job parameters as follows and start the job. + + Select **Enable CheckPoint** in **Running Parameter** and set **Time Interval (ms)** to **60000**. + + .. code-block:: + + CREATE TABLE KafkaSource ( + `user_id` VARCHAR, + `user_name` VARCHAR, + `age` INT + ) WITH ( + 'connector' = 'kafka', + 'topic' = 'test_source', + 'properties.bootstrap.servers' = 'IP address of the Kafka broker instance:Kafka port number', + 'properties.group.id' = 'testGroup', + 'scan.startup.mode' = 'latest-offset', + 'format' = 'csv', + 'properties.sasl.kerberos.service.name' = 'kafka', + 'properties.security.protocol' = 'SASL_PLAINTEXT', + 'properties.kerberos.domain.name' = 'hadoop.System domain name' + ); + CREATE TABLE KafkaSink( + `user_id` VARCHAR, + `user_name` VARCHAR, + `age` INT + ) WITH ( + 'connector' = 'kafka', + 'topic' = 'test_sink', + 'properties.bootstrap.servers' = 'IP address of the Kafka broker instance:Kafka port number', + 'scan.startup.mode' = 'latest-offset', + 'value.format' = 'csv', + 'properties.sasl.kerberos.service.name' = 'kafka', + 'properties.security.protocol' = 'SASL_PLAINTEXT', + 'properties.kerberos.domain.name' = 'hadoop.System domain name' + ); + Insert into + KafkaSink + select + * + from + KafkaSource; + + .. note:: + + Kafka port number + + - In security mode, the port number is the value of **sasl.port** (**21007** by default). + + - In non-security mode, the port is the value of **port** (**9092** by default). If the port number is set to **9092**, set **allow.everyone.if.no.acl.found** to **true**. The procedure is as follows: + + Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Kafka**. On the displayed page, click **Configurations** and then **All Configurations**, search for **allow.everyone.if.no.acl.found**, set its value to **true**, and click **Save**. + +#. On the job management page, check whether the job status is **Running**. + +#. Execute the following commands to view the topic and write data to Kafka. For details, see :ref:`Managing Messages in Kafka Topics `. + + **./kafka-topics.sh --list --zookeeper** *IP address of the ZooKeeper quorumpeer instance*:*ZooKeeper port number*\ **/kafka** + + **sh kafka-console-producer.sh --broker-list** *IP address of the node where the Kafka instance locates:Kafka port number* **--topic** *Topic name* --**producer.config** *Client directory*/**Kafka/kafka/config/producer.properties** + + For example, if the topic name is **test_source**, the script is **sh kafka-console-producer.sh --broker-list** *IP address of the node where the Kafka instance locates:Kafka port number* **--topic test_source** --**producer.config** **/opt/Bigdata/client/Kafka/kafka/config/producer.properties**. + + Enter the message content. + + .. code-block:: + + 1,clw,33 + + Press **Enter** to send the message. + + .. note:: + + - IP address of the ZooKeeper quorumpeer instance + + To obtain IP addresses of all ZooKeeper quorumpeer instances, log in to FusionInsight Manager and choose **Cluster** > **Services** > **ZooKeeper**. On the displayed page, click **Instance** and view the IP addresses of all the hosts where the quorumpeer instances locate. + + - Port number of the ZooKeeper client + + Log in to FusionInsight Manager and choose **Cluster** > **Service** > **ZooKeeper**. On the displayed page, click **Configurations** and check the value of **clientPort**. The default value is **24002**. + +#. Run the following commands to check whether data is written from the Kafka topic to the sink table: + + **sh kafka-console-consumer.sh --topic** *Topic name* **--bootstrap-server** *IP address of the Kafka broker instance*:**Kafka port number** --**consumer.config /opt/Bigdata/client/Kafka/kafka/config/consumer.properties** + +WITH Parameters +--------------- + ++------------------------------+--------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Parameter | Mandatory | Type | Description | ++==============================+============================================+=================+==========================================================================================================================================================================================================================+ +| connector | Yes | String | Connector to be used. **kafka** is used for Kafka. | ++------------------------------+--------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| topic | - Yes (Kafka functions as a sink table.) | String | Topic name. | +| | - No (Kafka functions as a source table.) | | | +| | | | - When the Kafka is used as a source table, this parameter indicates the name of the topic from which data is read. Topic list is supported. Topics are separated by semicolons (;), for example, **Topic-1; Topic-2**. | +| | | | - When Kafka is used as a sink table, this parameter indicates the name of the topic to which data is written. Topic list is not supported for sinks. | ++------------------------------+--------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| topic-pattern | No (Kafka functions as a source table.) | String | Topic pattern. | +| | | | | +| | | | This parameter is available when Kafka is used as a source table. The topic name must be a regular expression. | +| | | | | +| | | | .. note:: | +| | | | | +| | | | **topic-pattern** and **topic** cannot be set at the same time. | ++------------------------------+--------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| properties.bootstrap.servers | Yes | String | List of Kafka brokers, which are separated by commas (,). | ++------------------------------+--------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| properties.group.id | Yes (Kafka functions as a source table.) | String | Kafka user group ID. | ++------------------------------+--------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| format | Yes | String | Format of the value used for deserializing and serializing Kafka messages. | ++------------------------------+--------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| properties.\* | No | String | Authentication-related parameters that need to be added in security mode. | ++------------------------------+--------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_jobs_on_the_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_jobs_on_the_flink_web_ui.rst new file mode 100644 index 0000000..d33634c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_jobs_on_the_flink_web_ui.rst @@ -0,0 +1,209 @@ +:original_name: mrs_01_24024.html + +.. _mrs_01_24024: + +Managing Jobs on the Flink Web UI +================================= + +Scenario +-------- + +Define Flink jobs, including Flink SQL and Flink JAR jobs. + +.. _mrs_01_24024__en-us_topic_0000001173470782_section1746418521537: + +Creating a Job +-------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. + +#. Click **Job Management**. The job management page is displayed. + +#. Click **Create Job**. On the displayed job creation page, set parameters by referring to :ref:`Table 1 ` and click **OK**. The job development page is displayed. + + .. _mrs_01_24024__en-us_topic_0000001173470782_table25451917135812: + + .. table:: **Table 1** Parameters for creating a job + + +-------------+----------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +=============+================================================================================================================+ + | Type | Job type, which can be **Flink SQL** or **Flink Jar**. | + +-------------+----------------------------------------------------------------------------------------------------------------+ + | Name | Job name, which can contain a maximum of 64 characters. Only letters, digits, and underscores (_) are allowed. | + +-------------+----------------------------------------------------------------------------------------------------------------+ + | Task Type | Type of the job data source, which can be a stream job or a batch job. | + +-------------+----------------------------------------------------------------------------------------------------------------+ + | Description | Job description, which can contain a maximum of 100 characters. | + +-------------+----------------------------------------------------------------------------------------------------------------+ + +#. .. _mrs_01_24024__en-us_topic_0000001173470782_li3175133444316: + + (Optional) If you need to develop a job immediately, configure the job on the job development page. + + - .. _mrs_01_24024__en-us_topic_0000001173470782_li1375424453411: + + Creating a Flink SQL job + + a. Develop the job on the job development page. + + |image1| + + b. Click **Check Semantic** to check the input content and click **Format SQL** to format SQL statements. + + c. After the job SQL statements are developed, set running parameters by referring to :ref:`Table 2 ` and click **Save**. + + .. _mrs_01_24024__en-us_topic_0000001173470782_table4292165617332: + + .. table:: **Table 2** Running parameters + + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+===============================================================================================================================================================================================================+ + | Parallelism | Number of concurrent jobs. The value must be a positive integer containing a maximum of 64 characters. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Maximum Operator Parallelism | Maximum parallelism of operators. The value must be a positive integer containing a maximum of 64 characters. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | JobManager Memory (MB) | Memory of JobManager The minimum value is **512** and the value can contain a maximum of 64 characters. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Submit Queue | Queue to which a job is submitted. If this parameter is not set, the **default** queue is used. The queue name can contain a maximum of 30 characters. Only letters, digits, and underscores (_) are allowed. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | taskManager | taskManager running parameters include: | + | | | + | | - **Slots**: If this parameter is left blank, the default value **1** is used. | + | | - **Memory (MB)**: The minimum value is **512**. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Enable CheckPoint | Whether to enable CheckPoint. After CheckPoint is enabled, you need to configure the following information: | + | | | + | | - **Time Interval (ms)**: This parameter is mandatory. | + | | | + | | - **Mode**: This parameter is mandatory. | + | | | + | | The options are **EXACTLY_ONCE** and **AT_LEAST_ONCE**. | + | | | + | | - **Minimum Interval (ms)**: The minimum value is **10**. | + | | | + | | - **Timeout Duration**: The minimum value is **10**. | + | | | + | | - **Maximum Parallelism**: The value must be a positive integer containing a maximum of 64 characters. | + | | | + | | - **Whether to clean up**: This parameter can be set to **Yes** or **No**. | + | | | + | | - **Whether to enable incremental checkpoints**: This parameter can be set to **Yes** or **No**. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Failure Recovery Policy | Failure recovery policy of a job. The options are as follows: | + | | | + | | - **fixed-delay**: You need to configure **Retry Times** and **Retry Interval (s)**. | + | | - **failure-rate**: You need to configure **Max Retry Times**, **Interval (min)**, and **Retry Interval (s)**. | + | | - **none** | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + d. Click **Submit** in the upper left corner to submit the job. + + - Creating a Flink JAR job + + a. Click **Select**, upload a local JAR file, and set parameters by referring to :ref:`Table 3 `. + + + .. figure:: /_static/images/en-us_image_0000001349059937.png + :alt: **Figure 1** Creating a Flink JAR job + + **Figure 1** Creating a Flink JAR job + + .. _mrs_01_24024__en-us_topic_0000001173470782_table1388311381402: + + .. table:: **Table 3** Parameter configuration + + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+===============================================================================================================================================================================================================+ + | Local .jar File | Upload a local JAR file. The size of the file cannot exceed 10 MB. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Main Class | Main-Class type. | + | | | + | | - **Default**: By default, the class name is specified based on the **Mainfest** file in the JAR file. | + | | - **Specify**: Manually specify the class name. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Class name. | + | | | + | | This parameter is available when **Main Class** is set to **Specify**. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Class Parameter | Class parameters of Main-Class (parameters are separated by spaces). | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parallelism | Number of concurrent jobs. The value must be a positive integer containing a maximum of 64 characters. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | JobManager Memory (MB) | Memory of JobManager The minimum value is **512** and the value can contain a maximum of 64 characters. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Submit Queue | Queue to which a job is submitted. If this parameter is not set, the **default** queue is used. The queue name can contain a maximum of 30 characters. Only letters, digits, and underscores (_) are allowed. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | taskManager | taskManager running parameters include: | + | | | + | | - **Slots**: If this parameter is left blank, the default value **1** is used. | + | | - **Memory (MB)**: The minimum value is **512**. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + b. Click **Save** to save the configuration and click **Submit** to submit the job. + +#. Return to the job management page. You can view information about the created job, including job name, type, status, kind, and description. + +Starting a Job +-------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Job Management**. The job management page is displayed. +#. In the **Operation** column of the job to be started, click **Start** to run the job. Jobs in the **Draft**, **Saved**, **Submission failed**, **Running succeeded**, **Running failed**, or **Stop** state can be started. + +Developing a Job +---------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Job Management**. The job management page is displayed. +#. In the **Operation** column of the job to be developed, click **Develop** to go to the job development page. Develop a job by referring to :ref:`4 `. You can view created stream tables and fields in the list on the left. + +Editing the Job Name and Description +------------------------------------ + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Job Management**. The job management page is displayed. +#. In the **Operation** column of the item to be modified, click **Edit**, modify **Description**, and click **OK** to save the modification. + +Viewing Job Details +------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Job Management**. The job management page is displayed. +#. In the **Operation** column of the item to be viewed, choose **More** > **Job Monitoring** to view the job running details. + + .. note:: + + You can only view details about jobs in the **Running** state. + +Checkpoint Failure Recovery +--------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Job Management**. The job management page is displayed. +#. In the Operation column of the item to be restored, click **More** > **Checkpoint Failure Recovery**. You can perform checkpoint failure recovery for jobs in the **Running failed**, **Running Succeeded**, or **Stop** state. + +Filtering/Searching for Jobs +---------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Job Management**. The job management page is displayed. +#. In the upper right corner of the page, you can obtain job information by selecting the job name, or enter a keyword to search for a job. + +Stopping a Job +-------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Job Management**. The job management page is displayed. +#. In the **Operation** column of the item to be stopped, click **Stop**. Jobs in the **Submitting**, **Submission succeeded**, or **Running** state can be stopped. + +Deleting a Job +-------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Job Management**. The job management page is displayed. +#. In the **Operation** column of the item to be deleted, click **Delete**, and click **OK** in the displayed page. Jobs in the **Draft**, **Saved**, **Submission failed**, **Running succeeded**, **Running failed**, or **Stop** state can be deleted. + +.. |image1| image:: /_static/images/en-us_image_0000001387905484.png diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_tables_on_the_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_tables_on_the_flink_web_ui.rst new file mode 100644 index 0000000..d78a45d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_tables_on_the_flink_web_ui.rst @@ -0,0 +1,99 @@ +:original_name: mrs_01_24023.html + +.. _mrs_01_24023: + +Managing Tables on the Flink Web UI +=================================== + +Scenario +-------- + +Data tables can be used to define basic attributes and parameters of source tables, dimension tables, and output tables. + +Creating a Stream Table +----------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. + +#. Click **Table Management**. The table management page is displayed. + +#. Click **Create Stream Table**. On the stream table creation page, set parameters by referring to :ref:`Table 1 ` and click **OK**. + + .. _mrs_01_24023__en-us_topic_0000001173789522_table205858588169: + + .. table:: **Table 1** Parameters for creating a stream table + + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Remarks | + +========================+====================================================================================================================================================================================================================================+=====================================================================================================================================================+ + | Stream/Table Name | Stream/Table name, which can contain 1 to 64 characters. Only letters, digits, and underscores (_) are allowed. | Example: **flink_sink** | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Description | Stream/Table description information, which can contain 1 to 1024 characters. | ``-`` | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Mapping Table Type | Flink SQL does not provide the data storage function. Table creation is actually the creation of mapping for external data tables or storage. | ``-`` | + | | | | + | | The value can be **Kafka**, **HDFS**. | | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Includes the data source table **Source**, data result table **Sink**. Tables included in different mapping table types are as follows: | ``-`` | + | | | | + | | - Kafka: **Source** and **Sink** | | + | | - HDFS: **Source** and **Sink** | | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Data Connection | Name of the data connection. | ``-`` | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Topic | Kafka topic to be read. Multiple Kafka topics can be read. Use separators to separate topics. | ``-`` | + | | | | + | | This parameter is available when **Mapping Table Type** is set to **Kafka**. | | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | File Path | HDFS directory or a single file path to be transferred. | Example: | + | | | | + | | This parameter is available when **Mapping Table Type** is set to **HDFS**. | **/user/sqoop/** or **/user/sqoop/example.csv** | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Code | Codes corresponding to different mapping table types are as follows: | ``-`` | + | | | | + | | - Kafka: **CSV** and **JSON** | | + | | - HDFS: **CSV** | | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Prefix | When **Mapping Table Type** is set to **Kafka**, **Type** is set to **Source**, and **Code** is set to **JSON**, this parameter indicates the hierarchical prefixes of multi-layer nested JSON, which are separated by commas (,). | For example, **data,info** indicates that the content under **data** and **info** in the nested JSON file is used as the data input in JSON format. | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Separator | Meanings of this parameter corresponding to different mapping table types are as follows: This parameter is used to specify the separator between CSV fields. This parameter is available when **Code** is set to **CSV.** | Example: comma (**,**) | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Row Separator | Line break in the file, including **\\r**, **\\n**, and **\\r\\n**. | ``-`` | + | | | | + | | This parameter is available when **Mapping Table Type** is set to **HDFS**. | | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Column Separator | Field separator in the file. | Example: comma (**,**) | + | | | | + | | This parameter is available when **Mapping Table Type** is set to **HDFS**. | | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Stream Table Structure | Stream/Table structure, including **Name** and **Type**. | ``-`` | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Proctime | System time, which is irrelevant to the data timestamp. That is, the time when the calculation is complete in Flink operators. | ``-`` | + | | | | + | | This parameter is available when **Type** is set to **Source**. | | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + | Event Time | Time when an event is generated, that is, the timestamp generated during data generation. | ``-`` | + | | | | + | | This parameter is available when **Type** is set to **Source**. | | + +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+ + +Editing a Stream Table +---------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Table Management**. The table management page is displayed. +#. In the **Operation** column of the item to be modified, click **Edit**. On the displayed page, modify the stream table information by referring to :ref:`Table 1 ` and click **OK**. + +Searching for a stream table +---------------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Table Management**. The table management page is displayed. +#. In the upper right corner of the page, you can enter a keyword to search for stream table information. + +Deleting a Stream Table +----------------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. +#. Click **Table Management**. The table management page is displayed. +#. In the **Operation** column of the item to be deleted, click **Delete**, and click **OK** in the displayed page. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/index.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/index.rst new file mode 100644 index 0000000..bcccb93 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/index.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_24223.html + +.. _mrs_01_24223: + +Managing UDFs on the Flink Web UI +================================= + +- :ref:`Managing UDFs on the Flink Web UI ` +- :ref:`UDF Java and SQL Examples ` +- :ref:`UDAF Java and SQL Examples ` +- :ref:`UDTF Java and SQL Examples ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + managing_udfs_on_the_flink_web_ui + udf_java_and_sql_examples + udaf_java_and_sql_examples + udtf_java_and_sql_examples diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/managing_udfs_on_the_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/managing_udfs_on_the_flink_web_ui.rst new file mode 100644 index 0000000..2c36e38 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/managing_udfs_on_the_flink_web_ui.rst @@ -0,0 +1,78 @@ +:original_name: mrs_01_24211.html + +.. _mrs_01_24211: + +Managing UDFs on the Flink Web UI +================================= + +You can customize functions to extend SQL statements to meet personalized requirements. These functions are called user-defined functions (UDFs). You can upload and manage UDF JAR files on the Flink web UI and call UDFs when running jobs. + +Flink supports the following three types of UDFs, as described in :ref:`Table 1 `. + +.. _mrs_01_24211__en-us_topic_0000001173470778_table6945142055011: + +.. table:: **Table 1** Function classification + + +-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Description | + +===========================================+================================================================================================================================================+ + | User-defined Scalar function (UDF) | Supports one or more input parameters and returns a single result value. For details, see :ref:`UDF Java and SQL Examples `. | + +-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | User-defined aggregation function (UDAF) | Aggregates multiple records into one value. For details, see :ref:`UDAF Java and SQL Examples `. | + +-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | User-defined table-valued function (UDTF) | Supports one or more input parameters and returns multiple rows or columns. For details, see :ref:`UDTF Java and SQL Examples `. | + +-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + +Prerequisites +------------- + +You have prepared a UDF JAR file whose size does not exceed 200 MB. + +.. _mrs_01_24211__en-us_topic_0000001173470778_section117082071450: + +Uploading a UDF +--------------- + +#. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. + +#. Click **UDF Management**. The **UDF Management** page is displayed. + +#. Click **Add UDF**. Select and upload the prepared UDF JAR file for **Local .jar File**. + +#. Enter the UDF name and description and click **OK**. + + .. note:: + + A maximum of 10 UDF names can be added. **Name** can be customized. **Type** must correspond to the UDF function in the uploaded UDF JAR file. + +#. In the UDF list, you can view information about all UDFs in the current application. + +#. (Optional) If you need to run or develop a job immediately, configure the job on the **Job Management** page. + + Click **Job Management**. The job management page is displayed. + + - Starting a UDF job: In the **Operation** column of the UDF job, click **Start**. + - Developing a UDF job: In the **Operation** column of the UDF job, click **Develop**. For details about related parameters, see :ref:`Creating a Flink SQL job `. + - Stopping a UDF job: In the **Operation** column of the UDF job, click **Stop**. + - Deleting a UDF job: In the **Operation** column of the UDF job, click **Delete**. Only jobs in the **Stop** state can be deleted. + - Editing a UDF job: In the **Operation** column of the UDF job, click **Edit**. Only the description of a job can be modified. + - Viewing job details: In the **Operation** column of the UDF job, choose **More** > **Job Monitoring**. + - Performing checkpoint failure recovery: In the **Operation** column of the UDF job, choose **More** > **Checkpoint failure recovery** to recover the fault. You can perform checkpoint failure recovery for jobs in the **Running failed**, **Running Succeeded**, or **Stop** state. + +Editing a UDF +------------- + +#. Upload a UDF JAR file. For details, see :ref:`Uploading a UDF `. +#. In the **Operation** column of the UDF job, click **Edit**. The **Edit UDF** page is displayed. +#. Modify the information and click **OK**. + +Deleting a UDF +-------------- + +#. Upload a UDF JAR file. For details, see :ref:`Uploading a UDF `. +#. In the **Operation** column of the UDF job, click **Delete**. The **Delete UDF** page is displayed. +#. Confirm the information about the UDF to be deleted and click **OK**. + + .. note:: + + Only the UDFs that are not used can be deleted. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udaf_java_and_sql_examples.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udaf_java_and_sql_examples.rst new file mode 100644 index 0000000..298fc7a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udaf_java_and_sql_examples.rst @@ -0,0 +1,48 @@ +:original_name: mrs_01_24225.html + +.. _mrs_01_24225: + +UDAF Java and SQL Examples +========================== + +UDAF Java Example +----------------- + +.. code-block:: + + package com.xxx.udf; + import org.apache.flink.table.functions.AggregateFunction; + public class UdfClass_UDAF { + public static class AverageAccumulator { + public int sum; + } + public static class Average extends AggregateFunction { + public void accumulate(AverageAccumulator acc, Integer value) { + acc.sum += value; + } + @Override + public Integer getValue(AverageAccumulator acc) { + return acc.sum; + } + @Override + public AverageAccumulator createAccumulator() { + return new AverageAccumulator(); + } + } + } + +UDAF SQL Example +---------------- + +.. code-block:: + + CREATE TEMPORARY FUNCTION udaf as 'com.xxx.udf.UdfClass_UDAF$Average'; + CREATE TABLE udfSource (a int) WITH ('connector' = 'datagen','rows-per-second'='1','fields.a.min'='1','fields.a.max'='3'); + CREATE TABLE udfSink (b int,c int) WITH ('connector' = 'print'); + INSERT INTO + udfSink + SELECT + a, + udaf(a) + FROM + udfSource group by a; diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udf_java_and_sql_examples.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udf_java_and_sql_examples.rst new file mode 100644 index 0000000..79d68da --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udf_java_and_sql_examples.rst @@ -0,0 +1,35 @@ +:original_name: mrs_01_24224.html + +.. _mrs_01_24224: + +UDF Java and SQL Examples +========================= + +UDF Java Example +---------------- + +.. code-block:: + + package com.xxx.udf; + import org.apache.flink.table.functions.ScalarFunction; + public class UdfClass_UDF extends ScalarFunction { + public int eval(String s) { + return s.length(); + } + } + +UDF SQL Example +--------------- + +.. code-block:: + + CREATE TEMPORARY FUNCTION udf as 'com.xxx..udf.UdfClass_UDF'; + CREATE TABLE udfSource (a VARCHAR) WITH ('connector' = 'datagen','rows-per-second'='1'); + CREATE TABLE udfSink (a VARCHAR,b int) WITH ('connector' = 'print'); + INSERT INTO + udfSink + SELECT + a, + udf(a) + FROM + udfSource; diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udtf_java_and_sql_examples.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udtf_java_and_sql_examples.rst new file mode 100644 index 0000000..3329bc8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udtf_java_and_sql_examples.rst @@ -0,0 +1,37 @@ +:original_name: mrs_01_24227.html + +.. _mrs_01_24227: + +UDTF Java and SQL Examples +========================== + +UDTF Java Example +----------------- + +.. code-block:: + + package com.xxx.udf; + import org.apache.flink.api.java.tuple.Tuple2; + import org.apache.flink.table.functions.TableFunction; + public class UdfClass_UDTF extends TableFunction> { + public void eval(String str) { + Tuple2 tuple2 = Tuple2.of(str, str.length()); + collect(tuple2); + } + } + +UDTF SQL Example +---------------- + +.. code-block:: + + CREATE TEMPORARY FUNCTION udtf as 'com.xxx.udf.UdfClass_UDTF'; + CREATE TABLE udfSource (a VARCHAR) WITH ('connector' = 'datagen','rows-per-second'='1'); + CREATE TABLE udfSink (b VARCHAR,c int) WITH ('connector' = 'print'); + INSERT INTO + udfSink + SELECT + str, + strLength + FROM + udfSource,lateral table(udtf(udfSource.a)) as T(str,strLength); diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/flink_web_ui_application_process.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/flink_web_ui_application_process.rst new file mode 100644 index 0000000..5e7d98f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/flink_web_ui_application_process.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_24017.html + +.. _mrs_01_24017: + +Flink Web UI Application Process +================================ + +The Flink web UI application process is shown as follows: + + +.. figure:: /_static/images/en-us_image_0000001295899948.png + :alt: **Figure 1** Application process + + **Figure 1** Application process + +.. table:: **Table 1** Description of the Flink web UI application process + + +-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | Phase | Description | Reference Section | + +===========================================+==========================================================================================================================+=========================================================================+ + | Creating an application | Applications can be used to isolate different upper-layer services. | :ref:`Creating an Application on the Flink Web UI ` | + +-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | Creating a cluster connection | Different clusters can be accessed by configuring the cluster connection. | :ref:`Creating a Cluster Connection on the Flink Web UI ` | + +-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | Creating a data connection | Different data services can be accessed, such as HDFS, Kafka, through the data connection. | :ref:`Creating a Data Connection on the Flink Web UI ` | + +-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | Creating a stream table | Data tables can be used to define basic attributes and parameters of source tables, dimension tables, and output tables. | :ref:`Managing Tables on the Flink Web UI ` | + +-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | Creating a SQL/JAR job (stream/batch job) | APIs can be used to define Flink jobs, including Flink SQL and Flink Jar jobs. | :ref:`Managing Jobs on the Flink Web UI ` | + +-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | Managing a job | A created job can be managed, including starting, developing, stopping, deleting, and editing the job. | :ref:`Managing Jobs on the Flink Web UI ` | + +-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/index.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/index.rst new file mode 100644 index 0000000..4766f91 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_24015.html + +.. _mrs_01_24015: + +Overview +======== + +- :ref:`Introduction to Flink Web UI ` +- :ref:`Flink Web UI Application Process ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + introduction_to_flink_web_ui + flink_web_ui_application_process diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/introduction_to_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/introduction_to_flink_web_ui.rst new file mode 100644 index 0000000..7b297f2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/introduction_to_flink_web_ui.rst @@ -0,0 +1,56 @@ +:original_name: mrs_01_24016.html + +.. _mrs_01_24016: + +Introduction to Flink Web UI +============================ + +Flink web UI provides a web-based visual development platform. You only need to compile SQL statements to develop jobs, slashing the job development threshold. In addition, the exposure of platform capabilities allows service personnel to compile SQL statements for job development to quickly respond to requirements, greatly reducing the Flink job development workload. + +Flink Web UI Features +--------------------- + +The Flink web UI has the following features: + +- Enterprise-class visual O&M: GUI-based O&M management, job monitoring, and standardization of Flink SQL statements for job development. +- Quick cluster connection: After configuring the client and user credential key file, you can quickly access a cluster using the cluster connection function. +- Quick data connection: You can access a component by configuring the data connection function. If **Data Connection Type** is set to **HDFS**, you need to create a cluster connection. If **Authentication Mode** is set to **KERBEROS** for other data connection types, you need to create a cluster connection. If **Authentication Mode** is set to **SIMPLE**, you do not need to create a cluster connection. +- Visual development platform: The input/output mapping table can be customized to meet the requirements of different input sources and output destinations. +- Easy to use GUI-based job management + +Key Web UI Capabilities +----------------------- + +:ref:`Table 1 ` shows the key capabilities provided by Flink web UI. + +.. _mrs_01_24016__en-us_topic_0000001173949848_table91592142421: + +.. table:: **Table 1** Key web UI capabilities + + +------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Item | Description | + +====================================+===========================================================================================================================================================================================================================================+ + | Batch-Stream convergence | - Batch jobs and stream jobs can be processed with a unified set of Flink SQL statements. | + +------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Flink SQL kernel capabilities | - Flink SQL supports customized window size, stream compute within 24 hours, and batch processing beyond 24 hours. | + | | - Flink SQL supports Kafka and HDFS reading. Data can be written to Kafka, and HDFS. | + | | - A job can define multiple Flink SQL jobs, and multiple metrics can be combined into one job for computing. If a job contains same primary keys as well as same inputs and outputs, the job supports the computing of multiple windows. | + | | - The AVG, SUM, COUNT, MAX, and MIN statistical methods are supported. | + +------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Flink SQL functions on the console | - Cluster connection management allows you to configure clusters where services such as Kafka, and HDFS reside. | + | | - Data connection management allows you to configure services such as Kafka, and HDFS. | + | | - Data table management allows you to define data tables accessed by SQL statements and generate DDL statements. | + | | - Flink SQL job definition allows you to verify, parse, optimize, convert a job into a Flink job, and submit the job for running based on the entered SQL statements. | + +------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Flink job visual management | - Stream jobs and batch jobs can be defined in a visual manner. | + | | - Job resources, fault recovery policies, and checkpoint policies can be configured in a visual manner. | + | | - Status monitoring of stream and batch jobs are supported. | + | | - The Flink job O&M is enhanced, including redirection of the native monitoring page. | + +------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Performance and reliability | - Stream processing supports 24-hour window aggregation computing and millisecond-level performance. | + | | - Batch processing supports 90-day window aggregation computing, which can be completed in minutes. | + | | - Invalid data of stream processing and batch processing can be filtered out. | + | | - When HDFS data is read, the data can be filtered based on the calculation period in advance. | + | | - If the job definition platform is faulty or the service is degraded, jobs cannot be redefined, but the computing of existing jobs is not affected. | + | | - The automatic restart mechanism is provided for job failures. You can configure restart policies. | + +------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flink/viewing_flink_job_information.rst b/doc/component-operation-guide-lts/source/using_flink/viewing_flink_job_information.rst new file mode 100644 index 0000000..139f44b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/viewing_flink_job_information.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_0784.html + +.. _mrs_01_0784: + +Viewing Flink Job Information +============================= + +You can view Flink job information on the Yarn web UI. + +Prerequisites +------------- + +The Flink service has been installed in a cluster. + +Accessing the Yarn Web UI +------------------------- + +#. Go to the Yarn service page. + + Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > *Name of the desired cluster* > **Services** > **Yarn** > **Instance** > **Dashboard**. + +#. Click the link next to **ResourceManager WebUI** to go to the Yarn web UI page. diff --git a/doc/component-operation-guide-lts/source/using_flume/common_issues_about_flume.rst b/doc/component-operation-guide-lts/source/using_flume/common_issues_about_flume.rst new file mode 100644 index 0000000..d383761 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/common_issues_about_flume.rst @@ -0,0 +1,70 @@ +:original_name: mrs_01_1598.html + +.. _mrs_01_1598: + +Common Issues About Flume +========================= + +Flume logs are stored in **/var/log/Bigdata/flume/flume/flumeServer.log**. Most data transmission exceptions and data transmission failures are recorded in logs. You can run the following command: + +**tailf /var/log/Bigdata/flume/flume/flumeServer.log** + +- Problem: After the configuration file is uploaded, an exception occurs. After the configuration file is uploaded again, the scenario requirements are still not met, but no exception is recorded in the log. + + Solution: Restart the Flume process, run the **kill -9** *Process code* to kill the process code, and view the logs. + +- Issue: **"java.lang.IllegalArgumentException: Keytab is not a readable file: /opt/test/conf/user.keytab"** is displayed when HDFS is connected. + + Solution: Grant the read and write permissions to the Flume running user. + +- Problem: The following error is reported when the Flume client is connected to Kafka: + + .. code-block:: + + Caused by: java.io.IOException: /opt/FlumeClient/fusioninsight-flume-1.9.0/cof//jaas.conf (No such file or directory) + + Solution: Add the **jaas.conf** configuration file and save it to the **conf** directory of the Flume client. + + **vi jaas.conf** + + .. code-block:: + + KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + useKeyTab=true + keyTab="/opt/test/conf/user.keytab" + principal="flume_hdfs@" + useTicketCache=false + storeKey=true + debug=true; + }; + + Values of **keyTab** and **principal** vary depending on the actual situation. + +- Problem: The following error is reported when the Flume client is connected to HBase: + + .. code-block:: + + Caused by: java.io.IOException: /opt/FlumeClient/fusioninsight-flume-1.9.0/cof//jaas.conf (No such file or directory) + + Solution: Add the **jaas.conf** configuration file and save it to the **conf** directory of the Flume client. + + **vi jaas.conf** + + .. code-block:: + + Client { + com.sun.security.auth.module.Krb5LoginModule required + useKeyTab=true + keyTab="/opt/test/conf/user.keytab" + principal="flume_hbase@" + useTicketCache=false + storeKey=true + debug=true; + }; + + Values of **keyTab** and **principal** vary depending on the actual situation. + +- Question: After the configuration file is submitted, the Flume Agent occupies resources. How do I restore the Flume Agent to the state when the configuration file is not uploaded? + + Solution: Submit an empty **properties.properties** file. diff --git a/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/index.rst b/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/index.rst new file mode 100644 index 0000000..39976b5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_1073.html + +.. _mrs_01_1073: + +Configuring the Flume Service Model +=================================== + +- :ref:`Overview ` +- :ref:`Service Model Configuration Guide ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + overview + service_model_configuration_guide diff --git a/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/overview.rst b/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/overview.rst new file mode 100644 index 0000000..f76e1ba --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/overview.rst @@ -0,0 +1,10 @@ +:original_name: mrs_01_1074.html + +.. _mrs_01_1074: + +Overview +======== + +Guide a reasonable Flume service configuration by providing performance differences between Flume common modules, to avoid a nonstandard overall service performance caused when a frontend Source and a backend Sink do not match in performance. + +Only single channels are compared for description. diff --git a/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/service_model_configuration_guide.rst b/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/service_model_configuration_guide.rst new file mode 100644 index 0000000..9747e28 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/service_model_configuration_guide.rst @@ -0,0 +1,250 @@ +:original_name: mrs_01_1075.html + +.. _mrs_01_1075: + +Service Model Configuration Guide +================================= + +During Flume service configuration and module selection, the ultimate throughput of a sink must be greater than the maximum throughput of a source. Otherwise, in extreme load scenarios, the write speed of the source to a channel is greater than the read speed of sink from channel. Therefore, the channel is fully occupied due to frequent usage, and the performance is affected. The maximum throughput of common modules is: + +Avro Source = Avro Sink > Kafka Source > Kafka Sink > HDFS Sink > SpoolDir Source > HBase Sink > Taildir Source > Solr Sink + +Avro Source and Avro Sink are usually used in pairs to transfer data between multiple Flume Agents. Therefore, Avro Source and Avro Sink do not become a performance bottleneck in general scenarios. + +Inter-Module Performance +------------------------ + +Based on comparison between the limit performances of modules, Kafka Sink and HDFS Sink can meet the throughput requirements when the front-end is SpoolDir Source. However, HBase Sink and Solr Sink could become performance bottlenecks due to the low write performances thereof. As a result, data is stacked in Channel. If you have to use HBase Sink, Solr Sink, or other sinks that are prone to become performance bottlenecks, you can use **Channel Selector** or **Sink Group** to meet performance requirements. + +Channel Selector +---------------- + +A channel selector allows a source to connect to multiple channels. Data of the source can be distributed or copied by selecting different types of selectors. Currently, a channel selector provided by Flume can be a replicating channel selector or a multiplexing channel selector. + +Replicating: indicates that the data of the source is synchronized to all channels. + +Multiplexing: indicates that based on the value of a specific field of the header of an event, a channel is selected to send the data. In this way, the data is distributed based on a service type. + +- Replicating configuration example: + + .. code-block:: + + client.sources = kafkasource + client.channels = channel1 channel2 + client.sources.kafkasource.type = org.apache.flume.source.kafka.KafkaSource + client.sources.kafkasource.kafka.topics = topic1,topic2 + client.sources.kafkasource.kafka.consumer.group.id = flume + client.sources.kafkasource.kafka.bootstrap.servers = 10.69.112.108:21007 + client.sources.kafkasource.kafka.security.protocol = SASL_PLAINTEXT + client.sources.kafkasource.batchDurationMillis = 1000 + client.sources.kafkasource.batchSize = 800 + client.sources.kafkasource.channels = channel1 c el2 + + client.sources.kafkasource.selector.type = replicating + client.sources.kafkasource.selector.optional = channel2 + + .. table:: **Table 1** Parameters in the Replicating configuration example + + +-------------------+---------------+-------------------------------------------------------+ + | Parameter | Default Value | Description | + +===================+===============+=======================================================+ + | Selector.type | replicating | Selector type. Set this parameter to **replicating**. | + +-------------------+---------------+-------------------------------------------------------+ + | Selector.optional | ``-`` | Optional channel. Configure this parameter as a list. | + +-------------------+---------------+-------------------------------------------------------+ + +- Multiplexing configuration example: + + .. code-block:: + + client.sources = kafkasource + client.channels = channel1 channel2 + client.sources.kafkasource.type = org.apache.flume.source.kafka.KafkaSource + client.sources.kafkasource.kafka.topics = topic1,topic2 + client.sources.kafkasource.kafka.consumer.group.id = flume + client.sources.kafkasource.kafka.bootstrap.servers = 10.69.112.108:21007 + client.sources.kafkasource.kafka.security.protocol = SASL_PLAINTEXT + client.sources.kafkasource.batchDurationMillis = 1000 + client.sources.kafkasource.batchSize = 800 + client.sources.kafkasource.channels = channel1 channel2 + + client.sources.kafkasource.selector.type = multiplexing + client.sources.kafkasource.selector.header = myheader + client.sources.kafkasource.selector.mapping.topic1 = channel1 + client.sources.kafkasource.selector.mapping.topic2 = channel2 + client.sources.kafkasource.selector.default = channel1 + + .. table:: **Table 2** Parameters in the Multiplexing configuration example + + +---------------------+-----------------------+--------------------------------------------------------+ + | Parameter | Default Value | Description | + +=====================+=======================+========================================================+ + | Selector.type | replicating | Selector type. Set this parameter to **multiplexing**. | + +---------------------+-----------------------+--------------------------------------------------------+ + | Selector.header | Flume.selector.header | ``-`` | + +---------------------+-----------------------+--------------------------------------------------------+ + | Selector.default | ``-`` | ``-`` | + +---------------------+-----------------------+--------------------------------------------------------+ + | Selector.mapping.\* | ``-`` | ``-`` | + +---------------------+-----------------------+--------------------------------------------------------+ + + In a multiplexing selector example, select a field whose name is topic from the header of the event. When the value of the topic field in the header is topic1, send the event to a channel 1; or when the value of the topic field in the header is topic2, send the event to a channel 2. + + Selectors need to use a specific header of an event in a source to select a channel, and need to select a proper header based on a service scenario to distribute data. + +SinkGroup +--------- + +When the performance of a backend single sink is insufficient, and high reliability or heterogeneous output is required, you can use a sink group to connect a specified channel to multiple sinks, thereby meeting use requirements. Currently, Flume provides two types of sink processors to manage sinks in a sink group. The types are load balancing and failover. + +Failover: Indicates that there is only one active sink in the sink group each time, and the other sinks are on standby and inactive. When the active sink becomes faulty, one of the inactive sinks is selected based on priorities to take over services, so as to ensure that data is not lost. This is used in high-reliability scenarios. + +Load balancing: Indicates that all sinks in the sink group are active. Each sink obtains data from the channel and processes the data. In addition, during running, loads of all sinks in the sink group are balanced. This is used in performance improvement scenarios. + +- Load balancing configuration examples: + + .. code-block:: + + client.sources = source1 + client.sinks = sink1 sink2 + client.channels = channel1 + + client.sinkgroups = g1 + client.sinkgroups.g1.sinks = sink1 sink2 + client.sinkgroups.g1.processor.type = load_balance + client.sinkgroups.g1.processor.backoff = true + client.sinkgroups.g1.processor.selector = random + + client.sinks.sink1.type = logger + client.sinks.sink1.channel = channel1 + + client.sinks.sink2.type = logger + client.sinks.sink2.channel = channel1 + + .. table:: **Table 3** Parameters of Load Balancing configuration examples + + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +===============================+===============+==============================================================================================================================+ + | sinks | ``-`` | Specifies the sink list of the sink group. Multiple sinks are separated by spaces. | + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------+ + | processor.type | default | Specifies the type of a processor. Set this parameter to **load_balance**. | + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------+ + | processor.backoff | false | Indicates whether to back off failed sinks exponentially. | + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------+ + | processor.selector | round_robin | Specifies the selection mechanism. It must be round_robin, random, or a customized class that inherits AbstractSinkSelector. | + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------+ + | processor.selector.maxTimeOut | 30000 | Specifies the time for masking a faulty sink. The default value is 30,000 ms. | + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------+ + +- Failover configuration examples: + + .. code-block:: + + client.sources = source1 + client.sinks = sink1 sink2 + client.channels = channel1 + + client.sinkgroups = g1 + client.sinkgroups.g1.sinks = sink1 sink2 + client.sinkgroups.g1.processor.type = failover + client.sinkgroups.g1.processor.priority.sink1 = 10 + client.sinkgroups.g1.processor.priority.sink2 = 5 + client.sinkgroups.g1.processor.maxpenalty = 10000 + + client.sinks.sink1.type = logger + client.sinks.sink1.channel = channel1 + + client.sinks.sink2.type = logger + client.sinks.sink2.channel = channel1 + + .. table:: **Table 4** Parameters in the **failover** configuration example + + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +===============================+===============+==========================================================================================================================================================================================================================================================================================+ + | sinks | ``-`` | Specifies the sink list of the sink group. Multiple sinks are separated by spaces. | + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | processor.type | default | Specifies the type of a processor. Set this parameter to **failover**. | + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | processor.priority. | ``-`` | Priority. **** must be defined in description of sinks. A sink having a higher priority is activated earlier. A larger value indicates a higher priority. **Note**: If there are multiple sinks, their priorities must be different. Otherwise, only one of them takes effect. | + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | processor.maxpenalty | 30000 | Specifies the maximum backoff time of failed sinks (unit: ms). | + +-------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Interceptors +------------ + +The Flume interceptor supports modification or discarding of basic unit events during data transmission. You can specify the class name list of built-in interceptors in Flume or develop customized interceptors to modify or discard events. The following table lists the built-in interceptors in Flume. A complex example is used in this section. Other users can configure and use interceptions as required. + +.. note:: + + 1. The interceptor is used between the sources and channels of Flume. Most sources provide parameters for configuring interceptors. You can set the parameters as required. + + 2. Flume allows multiple interceptors to be configured for a source. The interceptor names are separated by spaces. + + 3. The specified interceptor sequence is the order in which they are called. + + 4. The contents inserted by the interceptor in the header can be read and used in sink. + +.. table:: **Table 5** Types of built-in interceptors in Flume + + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Interceptor Type | Description | + +================================+====================================================================================================================================================================================================+ + | Timestamp Interceptor | The interceptor inserts a timestamp into the header of an event. | + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Host Interceptor | The interceptor inserts the IP address or host name of the node where the agent is located into the Header of an event. | + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Remove Header Interceptor | The interceptor discards the corresponding event based on the strings that matches the regular expression contained in the event header. | + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | UUID Interceptor | The interceptor generates a UUID string for the header of each event. | + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Search and Replace Interceptor | The interceptor provides a simple string-based search and replacement function based on Java regular expressions. The rule is the same as that of Java Matcher.replaceAll(). | + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Regex Filtering Interceptor | The interceptor uses the body of an event as a text file and matches the configured regular expression to filter events. The provided regular expression can be used to exclude or include events. | + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Regex Extractor Interceptor | The interceptor extracts content from the original events using a regular expression and adds the content to the header of events. | + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +**Regex Filtering Interceptor** is used as an example to describe how to use the interceptor. (For other types of interceptions, see the configuration provided on the official website.) + +.. table:: **Table 6** Parameter configuration for **Regex Filtering Interceptor** + + +---------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +===============+===============+===========================================================================================================================================================+ + | type | ``-`` | Specifies the component type name. The value must be **regex_filter**. | + +---------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | regex | ``-`` | Specifies the regular expression used to match events. | + +---------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + | excludeEvents | false | By default, the matched events are collected. If this parameter is set to **true**, the matched events are deleted and the unmatched events are retained. | + +---------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Configuration example (netcat tcp is used as the source, and logger is used as the sink). After configuring the preceding parameters, run the **telnet** *Host name or IP address* **44444** command on the host where the Linux operating system is run, and enter a string that complies with the regular expression and another does not comply with the regular expression. The log shows that only the matched string is transmitted. + +.. code-block:: + + #define the source, channel, sink + server.sources = r1 + + server.channels = c1 + server.sinks = k1 + + #config the source + server.sources.r1.type = netcat + server.sources.r1.bind = ${Host IP address} + server.sources.r1.port = 44444 + server.sources.r1.interceptors= i1 + server.sources.r1.interceptors.i1.type= regex_filter + server.sources.r1.interceptors.i1.regex= (flume)|(myflume) + server.sources.r1.interceptors.i1.excludeEvents= false + server.sources.r1.channels = c1 + + #config the channel + server.channels.c1.type = memory + server.channels.c1.capacity = 1000 + server.channels.c1.transactionCapacity = 100 + #config the sink + server.sinks.k1.type = logger + server.sinks.k1.channel = c1 diff --git a/doc/component-operation-guide-lts/source/using_flume/connecting_flume_to_kafka_in_security_mode.rst b/doc/component-operation-guide-lts/source/using_flume/connecting_flume_to_kafka_in_security_mode.rst new file mode 100644 index 0000000..a23f884 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/connecting_flume_to_kafka_in_security_mode.rst @@ -0,0 +1,36 @@ +:original_name: mrs_01_1071.html + +.. _mrs_01_1071: + +Connecting Flume to Kafka in Security Mode +========================================== + +Scenario +-------- + +This section describes how to connect to Kafka using the Flume client in security mode. + +Procedure +--------- + +#. Create a **jaas.conf** file and save it to **${**\ *Flume client installation directory*\ **} /conf**. The content of the **jaas.conf** file is as follows: + + .. code-block:: + + KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + useKeyTab=true + keyTab="/opt/test/conf/user.keytab" + principal="flume_hdfs@" + useTicketCache=false + storeKey=true + debug=true; + }; + + Set **keyTab** and **principal** based on site requirements. The configured **principal** must have certain kafka permissions. + +#. Configure services. Set the port number of **kafka.bootstrap.servers** to **21007**, and set **kafka.security.protocol** to **SASL_PLAINTEXT**. + +#. If the domain name of the cluster where Kafka is located is changed, change the value of *-Dkerberos.domain.name* in the **flume-env.sh** file in **$**\ {*Flume client installation directory*} **/conf/** based on the site requirements. + +#. Upload the configured **properties.properties** file to **$**\ {*Flume client installation directory*} **/conf**. diff --git a/doc/component-operation-guide-lts/source/using_flume/connecting_flume_with_hive_in_security_mode.rst b/doc/component-operation-guide-lts/source/using_flume/connecting_flume_with_hive_in_security_mode.rst new file mode 100644 index 0000000..fd41bdd --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/connecting_flume_with_hive_in_security_mode.rst @@ -0,0 +1,181 @@ +:original_name: mrs_01_1072.html + +.. _mrs_01_1072: + +Connecting Flume with Hive in Security Mode +=========================================== + +Scenario +-------- + +This section describes how to use Flume to connect to Hive (version 3.1.0) in the cluster. + +Prerequisites +------------- + +Flume and Hive have been correctly installed in the cluster. The services are running properly, and no alarm is reported. + +Procedure +--------- + +#. Import the following JAR packages to the lib directory (client/server) of the Flume instance to be tested as user **omm**: + + - antlr-2.7.7.jar + - antlr-runtime-3.4.jar + - calcite-core-1.16.0.jar + - hadoop-mapreduce-client-core-3.1.1.jar + - hive-beeline-3.1.0.jar + - hive-cli-3.1.0.jar + - hive-common-3.1.0.jar + - hive-exec-3.1.0.jar + - hive-hcatalog-core-3.1.0.jar + - hive-hcatalog-pig-adapter-3.1.0.jar + - hive-hcatalog-server-extensions-3.1.0.jar + - hive-hcatalog-streaming-3.1.0.jar + - hive-metastore-3.1.0.jar + - hive-service-3.1.0.jar + - libfb303-0.9.3.jar + - hadoop-plugins-1.0.jar + + You can obtain the JAR package from the Hive installation directory and restart the Flume process to ensure that the JAR package is loaded to the running environment. + +#. Set Hive configuration items. + + On FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **Hive** > **Configurations** > **All Configurations** > **HiveServer** > **Customization** > **hive.server.customized.configs**. + + |image1| + + Example configurations: + + +----------------------------------+------------------------------------------------+ + | Name | Value | + +==================================+================================================+ + | hive.support.concurrency | true | + +----------------------------------+------------------------------------------------+ + | hive.exec.dynamic.partition.mode | nonstrict | + +----------------------------------+------------------------------------------------+ + | hive.txn.manager | org.apache.hadoop.hive.ql.lockmgr.DbTxnManager | + +----------------------------------+------------------------------------------------+ + | hive.compactor.initiator.on | true | + +----------------------------------+------------------------------------------------+ + | hive.compactor.worker.threads | 1 | + +----------------------------------+------------------------------------------------+ + +#. Prepare the system user **flume_hive** who has the supergroup and Hive permissions, install the client, and create the required Hive table. + + Example: + + a. The cluster client has been correctly installed. For example, the installation directory is **/opt/client**. + + b. Run the following command to authenticate the user: + + **cd /opt/client** + + **source bigdata_env** + + **kinit flume_hive** + + c. Run the **beeline** command and run the following table creation statement: + + .. code-block:: + + create table flume_multi_type_part(id string, msg string) + partitioned by (country string, year_month string, day string) + clustered by (id) into 5 buckets + stored as orc TBLPROPERTIES('transactional'='true'); + + d. Run the **select \* from** *Table name*\ **;** command to query data in the table. + + In this case, the number of data records in the table is **0**. + +#. Prepare related configuration files. Assume that the client installation package is stored in **/opt/FusionInsight_Cluster_1_Services_ClientConfig**. + + a. Obtain the following files from the $\ *Client decompression directory*\ **/Hive/config** directory: + + - hivemetastore-site.xml + - hive-site.xml + + b. Obtain the following files from the **$**\ *Client decompression directory*\ **/HDFS/config** directory: + + core-site.xml + + c. Create a directory on the host where the Flume instance is started and save the prepared files to the created directory. + + Example: **/opt/hivesink-conf/hive-site.xml**. + + d. Copy all property configurations in the **hivemetastore-site.xml** file to the **hive-site.xml** file and ensure that the configurations are placed before the original configurations. + + Data is loaded in sequence in Hive. + + .. note:: + + Ensure that the Flume running user **omm** has the read and write permissions on the directory where the configuration file is stored. + +#. Observe the result. + + On the Hive client, run the **select \* from** *Table name*\ **;** command. Check whether the corresponding data has been written to the Hive table. + +Examples +-------- + +Flume configuration example (SpoolDir--Mem--Hive): + +.. code-block:: + + server.sources = spool_source + server.channels = mem_channel + server.sinks = Hive_Sink + + #config the source + server.sources.spool_source.type = spooldir + server.sources.spool_source.spoolDir = /tmp/testflume + server.sources.spool_source.montime = + server.sources.spool_source.fileSuffix =.COMPLETED + server.sources.spool_source.deletePolicy = never + server.sources.spool_source.trackerDir =.flumespool + server.sources.spool_source.ignorePattern = ^$ + server.sources.spool_source.batchSize = 20 + server.sources.spool_source.inputCharset =UTF-8 + server.sources.spool_source.selector.type = replicating + server.sources.spool_source.fileHeader = false + server.sources.spool_source.fileHeaderKey = file + server.sources.spool_source.basenameHeaderKey= basename + server.sources.spool_source.deserializer = LINE + server.sources.spool_source.deserializer.maxBatchLine= 1 + server.sources.spool_source.deserializer.maxLineLength= 2048 + server.sources.spool_source.channels = mem_channel + + #config the channel + server.channels.mem_channel.type = memory + server.channels.mem_channel.capacity =10000 + server.channels.mem_channel.transactionCapacity= 2000 + server.channels.mem_channel.channelfullcount= 10 + server.channels.mem_channel.keep-alive = 3 + server.channels.mem_channel.byteCapacity = + server.channels.mem_channel.byteCapacityBufferPercentage= 20 + + #config the sink + server.sinks.Hive_Sink.type = hive + server.sinks.Hive_Sink.channel = mem_channel + server.sinks.Hive_Sink.hive.metastore = thrift://${any MetaStore service IP address}:21088 + server.sinks.Hive_Sink.hive.hiveSite = /opt/hivesink-conf/hive-site.xml + server.sinks.Hive_Sink.hive.coreSite = /opt/hivesink-conf/core-site.xml + server.sinks.Hive_Sink.hive.metastoreSite = /opt/hivesink-conf/hivemeatastore-site.xml + server.sinks.Hive_Sink.hive.database = default + server.sinks.Hive_Sink.hive.table = flume_multi_type_part + server.sinks.Hive_Sink.hive.partition = Tag,%Y-%m,%d + server.sinks.Hive_Sink.hive.txnsPerBatchAsk= 100 + server.sinks.Hive_Sink.hive.autoCreatePartitions= true + server.sinks.Hive_Sink.useLocalTimeStamp = true + server.sinks.Hive_Sink.batchSize = 1000 + server.sinks.Hive_Sink.hive.kerberosPrincipal= super1 + server.sinks.Hive_Sink.hive.kerberosKeytab= /opt/mykeytab/user.keytab + server.sinks.Hive_Sink.round = true + server.sinks.Hive_Sink.roundValue = 10 + server.sinks.Hive_Sink.roundUnit = minute + server.sinks.Hive_Sink.serializer = DELIMITED + server.sinks.Hive_Sink.serializer.delimiter= ";" + server.sinks.Hive_Sink.serializer.serdeSeparator= ';' + server.sinks.Hive_Sink.serializer.fieldnames= id,msg + +.. |image1| image:: /_static/images/en-us_image_0000001348739765.png diff --git a/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/configuring_the_encrypted_transmission.rst b/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/configuring_the_encrypted_transmission.rst new file mode 100644 index 0000000..69e54a7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/configuring_the_encrypted_transmission.rst @@ -0,0 +1,349 @@ +:original_name: mrs_01_1069.html + +.. _mrs_01_1069: + +Configuring the Encrypted Transmission +====================================== + +Scenario +-------- + +This section describes how to configure the server and client parameters of the Flume service (including the Flume and MonitorServer roles) after the cluster is installed to ensure proper running of the service. + +Prerequisites +------------- + +The cluster and Flume service have been installed. + +Procedure +--------- + +#. Generate the certificate trust lists of the server and client of the Flume role respectively. + + a. Remotely log in to the node using ECM where the Flume server is to be installed as user **omm**. Go to the **${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin** directory. + + **cd ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin** + + .. note:: + + The version 8.1.2.2 is used as an example. Replace it with the actual version number. + + b. Run the following command to generate and export the server and client certificates of the Flume role: + + **sh geneJKS.sh -f sNetty12@ -g cNetty12@** + + The generated certificate is saved in the **${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf** path . + + - **flume_sChat.jks** is the certificate library of the Flume role server. **flume_sChat.crt** is the exported file of the **flume_sChat.jks** certificate. **-f** indicates the password of the certificate and certificate library. + - **flume_cChat.jks** is the certificate library of the Flume role client. **flume_cChat.crt** is the exported file of the **flume_cChat.jks** certificate. **-g** indicates the password of the certificate and certificate library. + - **flume_sChatt.jks** and **flume_cChatt.jks** are the SSL certificate trust lists of the Flume server and client, respectively. + + .. note:: + + All user-defined passwords involved in this section (such as *sNetty12@*) must meet the following requirements: + + - The password must contain at least four types of uppercase letters, lowercase letters, digits, and special characters. + - The password must contain 8 to 64 characters. + - It is recommended that the user-defined passwords be changed periodically (for example, every three months), and certificates and trust lists be generated again to ensure security. + +#. Configure the server parameters of the Flume role and upload the configuration file to the cluster. + + a. Remotely log in to any node where the Flume role is located as user **omm** using ECM. Run the following command to go to the ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin directory: + + **cd ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin** + + b. .. _mrs_01_1069__en-us_topic_0000001173631348_l9f81f0e892824e79a1414cd62cce07ba: + + Run the following command to generate and obtain Flume server keystore password, trust list password, and keystore-password encrypted private key information. Enter the password twice and confirm the password. It is the password of the **flume_sChat.jks** certificate library, for example, *sNetty12@*. + + **./genPwFile.sh** + + **cat password.property** + + .. code-block:: + + password=D03C2D03D97CBA3F4FD2491A40CAA5E0 + + c. Use the Flume configuration tool on the FusionInsight Manager portal to configure the server parameters and generate the configuration file. + + #. Log in to FusionInsight Manager. Choose **Services** > **Flume** > **Configuration Tool**. + + #. Set **Agent Name** to **server**. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them. + + For example, use Avro Source, File Channel, and HDFS Sink, as shown in :ref:`Figure 1 `. + + .. _mrs_01_1069__en-us_topic_0000001173631348_f7115f88950ae456f9f3bd83a9a12eb02: + + .. figure:: /_static/images/en-us_image_0000001349059641.png + :alt: **Figure 1** Example for the Flume configuration tool + + **Figure 1** Example for the Flume configuration tool + + #. Double-click the source, channel, and sink. Set corresponding configuration parameters by seeing :ref:`Table 1 ` based on the actual environment. + + .. note:: + + - If the server parameters of the Flume role have been configured, you can choose **Services** > **Flume** > **Instance** on FusionInsight Manager. Then select the corresponding Flume role instance and click the **Download** button behind the **flume.config.file** parameter on the **Instance Configurations** page to obtain the existing server parameter configuration file. Choose **Services** > **Flume** > **Import** to change the relevant configuration items of encrypted transmission after the file is imported. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + + #. Click **Export** to save the **properties.properties** configuration file to the local. + + .. _mrs_01_1069__en-us_topic_0000001173631348_te7d3219190a74a0aba371689e6bdb84d: + + .. table:: **Table 1** Parameters to be modified of the Flume role server + + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+===================================================================================================================================+===================================================================================================================+ + | ssl | Specifies whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+ + | keystore | Indicates the server certificate. | ${BIGDATA_HOME\ **}**/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/flume_sChat.jks | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+ + | keystore-password | Specifies the password of the key library, which is the password required to obtain the keystore information. | D03C2D03D97CBA3F4FD2491A40CAA5E0 | + | | | | + | | Enter the value of password obtained in :ref:`2.b `. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+ + | truststore | Indicates the SSL certificate trust list of the server. | ${BIGDATA_HOME\ **}**/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/flume_sChatt.jks | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+ + | truststore-password | Specifies the trust list password, which is the password required to obtain the truststore information. | D03C2D03D97CBA3F4FD2491A40CAA5E0 | + | | | | + | | Enter the value of password obtained in :ref:`2.b `. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+ + + d. Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume**. On the displayed page, click the **Flume** role under **Role**. + + e. Select the Flume role of the node where the configuration file is to be uploaded, choose **Instance Configurations** > **Import** beside the **flume.config.file**, and select the **properties.properties** file. + + .. note:: + + - An independent server configuration file can be uploaded to each Flume instance. + - This step is required for updating the configuration file. Modifying the configuration file on the background is an improper operation because the modification will be overwritten after configuration synchronization. + + f. Click **Save**, and then click **OK**. Click **Finish**. + +#. Set the client parameters of the Flume role. + + a. Run the following commands to copy the generated client certificate (**flume_cChat.jks**) and client trust list (**flume_cChatt.jks**) to the client directory, for example, **/opt/flume-client/fusionInsight-flume-1.9.0/conf/**. (The Flume client must have been installed.) **10.196.26.1** is the service plane IP address of the node where the client resides. + + **scp ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/flume_cChat.jks user@10.196.26.1:/opt/flume-client/fusionInsight-flume-1.9.0/conf/** + + **scp ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/flume_cChatt.jks user@10.196.26.1:/opt/flume-client/fusionInsight-flume-1.9.0/conf/** + + .. note:: + + When copying the client certificate, you need to enter the password of user **user** of the host (for example, **10.196.26.1**) where the client resides. + + b. Log in to the node where the Flume client is decompressed as user **user**. Run the following command to go to the client directory **opt/flume-client/fusionInsight-flume-1.9.0/bin**. + + **cd** **opt/flume-client/fusionInsight-flume-1.9.0/bin** + + c. .. _mrs_01_1069__en-us_topic_0000001173631348_l5265677717ab4dd5971a3b6a0d0be5f6: + + Run the following command to generate and obtain Flume client keystore password, trust list password, and keystore-password encrypted private key information. Enter the password twice and confirm the password. The password is the same as the password of the certificate whose alias is *flumechatclient* and the password of the *flume_cChat.jks* certificate library, for example *cNetty12@*. + + **./genPwFile.sh** + + **cat password.property** + + .. code-block:: + + password=4FD2491A40CAA5E0D03C2D03D97CBA3F + + .. note:: + + If the following error message is displayed, run the export **JAVA_HOME=\ JDK path** command. + + .. code-block:: + + JAVA_HOME is null in current user,please install the JDK and set the JAVA_HOME + + d. Use the Flume configuration tool on FusionInsight Manager to configure the Flume role client parameters and generate a configuration file. + + #. Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool**. + + #. Set **Agent Name** to **client**. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them. + + For example, use SpoolDir Source, File Channel, and Avro Sink, as shown in :ref:`Figure 2 `. + + .. _mrs_01_1069__en-us_topic_0000001173631348_f800f39a7cdcf443eab83c9ebcd2211bc: + + .. figure:: /_static/images/en-us_image_0000001295739988.png + :alt: **Figure 2** Example for the Flume configuration tool + + **Figure 2** Example for the Flume configuration tool + + #. Double-click the source, channel, and sink. Set corresponding configuration parameters by seeing :ref:`Table 2 ` based on the actual environment. + + .. note:: + + - If the client parameters of the Flume role have been configured, you can obtain the existing client parameter configuration file from *client installation directory*\ **/fusioninsight-flume-1.9.0/conf/properties.properties** to ensure that the configuration is in concordance with the previous. Log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool** > **Import**, import the file, and modify the configuration items related to encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + - A unique checkpoint directory needs to be configured for each File Channel. + + #. Click **Export** to save the **properties.properties** configuration file to the local. + + .. _mrs_01_1069__en-us_topic_0000001173631348_t231a870090124a8e8556717e6a7db11c: + + .. table:: **Table 2** Parameters to be modified of the Flume role client + + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+===================================================================================================================================+===================================================================+ + | ssl | Indicates whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | keystore | Specified the client certificate. | /opt/flume-client/fusionInsight-flume-1.9.0/conf/flume_cChat.jks | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | keystore-password | Specifies the password of the key library, which is the password required to obtain the keystore information. | 4FD2491A40CAA5E0D03C2D03D97CBA3F | + | | | | + | | Enter the value of password obtained in :ref:`3.c `. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | truststore | Indicates the SSL certificate trust list of the client. | /opt/flume-client/fusionInsight-flume-1.9.0/conf/flume_cChatt.jks | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | truststore-password | Specifies the trust list password, which is the password required to obtain the truststore information. | 4FD2491A40CAA5E0D03C2D03D97CBA3F | + | | | | + | | Enter the value of password obtained in :ref:`3.c `. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + + e. Upload the **properties.properties** file to **flume/conf/** under the installation directory of the Flume client. + +#. Generate the certificate and trust list of the server and client of the MonitorServer role respectively. + + a. Log in to the host using ECM with the MonitorServer role assigned as user **omm**. + + Go to the **${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin** directory. + + **cd ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin** + + b. Run the following command to generate and export the server and client certificates of the MonitorServer role: + + **sh geneJKS.sh -m sKitty12@ -n cKitty12@** + + The generated certificate is saved in the **${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf** path. Where: + + - **ms_sChat.jks** is the certificate library of the MonitorServer role server. **ms_sChat.crt** is the exported file of the **ms_sChat.jks** certificate. **-m** indicates the password of the certificate and certificate library. + - **ms_cChat.jks** is the certificate library of the MonitorServer role client. **ms_cChat.crt** is the exported file of the **ms_cChat.jks** certificate. **-n** indicates the password of the certificate and certificate library. + - **ms_sChatt.jks** and **ms_cChatt.jks** are the SSL certificate trust lists of the MonitorServer server and client, respectively. + +#. Set the server parameters of the MonitorServer role. + + a. .. _mrs_01_1069__en-us_topic_0000001173631348_l7cc74e0469cb45f4aba9974f2846c1e0: + + Run the following command to generate and obtain MonitorServer server keystore password, trust list password, and keystore-password encrypted private key information. Enter the password twice and confirm the password. The password is the same as the password of the certificate whose alias is *mschatserver* and the password of the *ms_sChat.jks* certificate library, for example *sKitty12@*. + + **./genPwFile.sh** + + **cat password.property** + + .. code-block:: + + password=AA5E0D03C2D4FD24CBA3F91A40C03D97 + + b. Run the following command to open the ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/service/application.properties file: Modify related parameters based on the description in :ref:`Table 3 `, save the modification, and exit. + + **vi ${BIGDATA_HOME}/FusionInsight_Porter\_**\ 8.1.2.2\ **/install/FusionInsight-Flume-1.9.0/flume/conf/service/application.properties** + + .. _mrs_01_1069__en-us_topic_0000001173631348_tc0d290285ae94086985870f879b563c2: + + .. table:: **Table 3** Parameters to be modified of the MonitorServer role server + + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=====================================+===========================================================================================================================================================================+==========================================================================================================+ + | ssl_need_kspasswd_decrypt_key | Specifies whether to enable the user-defined key encryption and decryption function. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_server_enable | Indicates whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_server_key_store | Set this parameter based on the specific storage location. | ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_sChat.jks | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_server_trust_key_store | Set this parameter based on the specific storage location. | ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_sChatt.jks | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_server_key_store_password | Indicates the client certificate password. Set this parameter based on the actual situation of certificate creation (the plaintext key used to generate the certificate). | AA5E0D03C2D4FD24CBA3F91A40C03D97 | + | | | | + | | Enter the value of password obtained in :ref:`5.a `. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_server_trust_key_store_password | Specifies the trustkeystore password. Set this parameter based on the actual situation of certificate creation (the plaintext key used to generate the trust list). | AA5E0D03C2D4FD24CBA3F91A40C03D97 | + | | | | + | | Enter the value of password obtained in :ref:`5.a `. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_need_client_auth | Indicates whether to enable the client authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + + c. Restart the MonitorServer instance. Choose **Services** > **Flume** > **Instance** > **MonitorServer**, select the MonitorServer instance, and choose **More** > **Restart Instance**. Enter the system administrator password and click **OK**. After the restart is complete, click **Finish**. + +#. Set the client parameters of the MonitorServer role. + + a. Run the following commands to copy the generated client certificate (**ms_cChat.jks**) and client trust list (**ms_cChatt.jks**) to the **/opt/flume-client/fusionInsight-flume-1.9.0/conf/** client directory. **10.196.26.1** is the service plane IP address of the node where the client resides. + + **scp ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_cChat.jks user@10.196.26.1:/opt/flume-client/fusionInsight-flume-1.9.0/conf/** + + **scp ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_cChatt.jks user@10.196.26.1:/opt/flume-client/fusionInsight-flume-1.9.0/conf/** + + b. Log in to the node where the Flume client is located as **user**. Run the following command to go to the client directory **/opt/flume-client/fusionInsight-flume-1.9.0/bin**. + + **cd** **/opt/flume-client/fusionInsight-flume-1.9.0/bin** + + c. .. _mrs_01_1069__en-us_topic_0000001173631348_l252c5a768cc34fcca9cfaa5a90dfe8c0: + + Run the following command to generate and obtain MonitorServer client keystore password, trust list password, and keystore-password encrypted private key information. Enter the password twice and confirm the password. The password is the same as the password of the certificate whose alias is *mschatclient* and the password of the *ms_cChat.jks* certificate library, for example *cKitty12@*. + + **./genPwFile.sh** + + **cat password.property** + + .. code-block:: + + password=BA3F91A40C03D97AA5E0D03C2D4FD24C + + d. Run the following command to open the **/opt/flume-client/fusionInsight-flume-1.9.0/conf/service/application.properties** file. (**/opt/flume-client/fusionInsight-flume-1.9.0** is the directory where the client software is installed.) Modify related parameters based on the description in :ref:`Table 4 `, save the modification, and exit. + + **vi** **/opt/flume-client/fusionInsight-flume-1.9.0/flume/conf/service/application.properties** + + .. _mrs_01_1069__en-us_topic_0000001173631348_tea1b721973a843b7891ab85f51d2f2e6: + + .. table:: **Table 4** Parameters to be modified of the MonitorServer role client + + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=====================================+=====================================================================================================================================================================+==========================================================================================================+ + | ssl_need_kspasswd_decrypt_key | Indicates whether to enable the user-defined key encryption and decryption function. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_client_enable | Indicates whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_client_key_store | Set this parameter based on the specific storage location. | ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_cChat.jks | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_client_trust_key_store | Set this parameter based on the specific storage location. | ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_cChatt.jks | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_client_key_store_password | Specifies the keystore password. Set this parameter based on the actual situation of certificate creation (the plaintext key used to generate the certificate). | BA3F91A40C03D97AA5E0D03C2D4FD24C | + | | | | + | | Enter the value of **password** obtained in :ref:`6.c `. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_client_trust_key_store_password | Specifies the trustkeystore password. Set this parameter based on the actual situation of certificate creation (the plaintext key used to generate the trust list). | BA3F91A40C03D97AA5E0D03C2D4FD24C | + | | | | + | | Enter the value of **password** obtained in :ref:`6.c `. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_need_client_auth | Indicates whether to enable the client authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/index.rst b/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/index.rst new file mode 100644 index 0000000..dff45c0 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_1068.html + +.. _mrs_01_1068: + +Encrypted Transmission +====================== + +- :ref:`Configuring the Encrypted Transmission ` +- :ref:`Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + configuring_the_encrypted_transmission + typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs diff --git a/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs.rst b/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs.rst new file mode 100644 index 0000000..0c811fb --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs.rst @@ -0,0 +1,411 @@ +:original_name: mrs_01_1070.html + +.. _mrs_01_1070: + +Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS +========================================================================= + +Scenario +-------- + +This section describes how to use Flume client to collect static logs from a local PC and save them to the **/flume/test** directory on HDFS. + +Prerequisites +------------- + +- The cluster, HDFS and Flume services, and Flume client have been installed. +- User **flume_hdfs** has been created, and the HDFS directory and data used for log verification have been authorized to the user. + +Procedure +--------- + +#. Generate the certificate trust lists of the server and client of the Flume role respectively. + + a. Log in to the node where the Flume server is located as user **omm**. Go to the **${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin** directory. + + **cd ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin** + + b. Run the following command to generate and export the server and client certificates of the Flume role: + + **sh geneJKS.sh -f sNetty12@ -g cNetty12@** + + The generated certificate is saved in the **${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf** path . + + - **flume_sChat.jks** is the certificate library of the Flume role server. **flume_sChat.crt** is the exported file of the **flume_sChat.jks** certificate. **-f** indicates the password of the certificate and certificate library. + - **flume_cChat.jks** is the certificate library of the Flume role client. **flume_cChat.crt** is the exported file of the **flume_cChat.jks** certificate. **-g** indicates the password of the certificate and certificate library. + - **flume_sChatt.jks** and **flume_cChatt.jks** are the SSL certificate trust lists of the Flume server and client, respectively. + + .. note:: + + All user-defined passwords involved in this section (such as *sNetty12@*) must meet the following requirements: + + - Contain at least four types of the following: uppercase letters, lowercase letters, digits, and special characters. + - Contain at least eight characters and a maximum of 64 characters. + - It is recommended that the user-defined passwords be changed periodically (for example, every three months), and certificates and trust lists be generated again to ensure security. + +#. On FusionInsight Manager, choose **System > User** and choose **More > Download Authentication Credential** to download the Kerberos certificate file of user **flume_hdfs** and save it to the local host. +#. Configure the server parameters of the Flume role and upload the configuration file to the cluster. + + a. Log in to any node where the Flume role is located as user **omm**. Run the following command to go to the ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin directory: + + **cd ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin** + + b. .. _mrs_01_1070__en-us_topic_0000001219350773_lf43fc3e7d9364ddb9e475908dc382fc9: + + Run the following command to generate and obtain Flume server keystore password, trust list password, and keystore-password encrypted private key information. Enter the password twice and confirm the password. It is the password of the **flume_sChat.jks** certificate library, for example, *sNetty12@*. + + **./genPwFile.sh** + + **cat password.property** + + .. code-block:: + + password=D03C2D03D97CBA3F4FD2491A40CAA5E0 + + c. Use the Flume configuration tool on the FusionInsight Manager portal to configure the server parameters and generate the configuration file. + + #. Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool**. + + #. Set **Agent Name** to **server**. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them. + + For example, use Avro Source, File Channel, and HDFS Sink, as shown in :ref:`Figure 1 `. + + .. _mrs_01_1070__en-us_topic_0000001219350773_f6daeef3446e547b29a54dde54d85e083: + + .. figure:: /_static/images/en-us_image_0000001349139425.png + :alt: **Figure 1** Example for the Flume configuration tool + + **Figure 1** Example for the Flume configuration tool + + #. Double-click the source, channel, and sink. Set corresponding configuration parameters by seeing :ref:`Table 1 ` based on the actual environment. + + .. note:: + + - If the server parameters of the Flume role have been configured, you can choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Instance** on FusionInsight Manager. Then select the corresponding Flume role instance and click the **Download** button behind the **flume.config.file** parameter on the **Instance Configurations** page to obtain the existing server parameter configuration file. Choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool** > **Import**, import the file, and modify the configuration items related to encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + - A unique checkpoint directory needs to be configured for each File Channel. + + #. Click **Export** to save the **properties.properties** configuration file to the local. + + .. _mrs_01_1070__en-us_topic_0000001219350773_t90702710f4c74f1ea2a89064a9507879: + + .. table:: **Table 1** Parameters to be modified of the Flume role server + + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +========================+=========================================================================================================================================================================================================================================================================================================================================================================================================================================================+============================================================================================================================================================================================================================================+ + | Name | The value must be unique and cannot be left blank. | test | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | bind | Specifies the IP address to which Avro Source is bound. This parameter cannot be left blank. It must be configured as the IP address that the server configuration file will upload. | 192.168.108.11 | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | port | Specifies the IP address to which Avro Source is bound. This parameter cannot be left blank. It must be configured as an unused port. | 21154 | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ssl | Indicates whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | Only Sources of the Avro type have this configuration item. | | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore | Indicates the server certificate. | ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/flume_sChat.jks | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore-password | Specifies the password of the key library, which is the password required to obtain the keystore information. | D03C2D03D97CBA3F4FD2491A40CAA5E0 | + | | | | + | | Enter the value of **password** obtained in :ref:`3.b `. | | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore | Indicates the SSL certificate trust list of the server. | ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/flume_sChatt.jks | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore-password | Specifies the trust list password, which is the password required to obtain the truststore information. | D03C2D03D97CBA3F4FD2491A40CAA5E0 | + | | | | + | | Enter the value of **password** obtained in :ref:`3.b `. | | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | dataDirs | Specifies the directory for storing buffer data. The run directory is used by default. Configuring multiple directories on disks can improve transmission efficiency. Use commas (,) to separate multiple directories. If the directory is inside the cluster, the **/srv/BigData/hadoop/dataX/flume/data** directory can be used. **dataX** ranges from data1 to dataN. If the directory is outside the cluster, it needs to be independently planned. | /srv/BigData/hadoop/data1/flumeserver/data | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | checkpointDir | Specifies the directory for storing the checkpoint information, which is under the run directory by default. If the directory is inside the cluster, the **/srv/BigData/hadoop/dataX/flume/checkpoint** directory can be used. **dataX** ranges from data1 to dataN. If the directory is outside the cluster, it needs to be independently planned. | /srv/BigData/hadoop/data1/flumeserver/checkpoint | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | transactionCapacity | Specifies the transaction size, that is, the number of events in a transaction that can be processed by the current Channel. The size cannot be smaller than the batchSize of Source. Setting the same size as batchSize is recommended. | 61200 | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.path | Specifies the HDFS data write directory. This parameter cannot be left blank. | hdfs://hacluster/flume/test | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.inUsePrefix | Specifies the prefix of the file that is being written to HDFS. | TMP\_ | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.batchSize | Specifies the maximum number of events that can be written to HDFS once. | 61200 | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosPrincipal | Specifies the Kerberos authentication user, which is mandatory in security versions. This configuration is required only in security clusters. | flume_hdfs | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosKeytab | Specifies the keytab file path for Kerberos authentication, which is mandatory in security versions. This configuration is required only in security clusters. | /opt/test/conf/user.keytab | + | | | | + | | | .. note:: | + | | | | + | | | Obtain the **user.keytab** file from the Kerberos certificate file of the user **flume_hdfs**. In addition, ensure that the user who installs and runs the Flume client has the read and write permissions on the **user.keytab** file. | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.useLocalTimeStamp | Specifies whether to use the local time. Possible values are **true** and **false**. | true | + +------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + d. Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume**. On the displayed page, click the **Flume** role under **Role**. + + e. Select the Flume role of the node where the configuration file is to be uploaded, choose **Instance Configurations** > **Import** beside the **flume.config.file**, and select the **properties.properties** file. + + .. note:: + + - An independent server configuration file can be uploaded to each Flume instance. + - This step is required for updating the configuration file. Modifying the configuration file on the background is an improper operation because the modification will be overwritten after configuration synchronization. + + f. Click **Save**, and then click **OK**. + + g. Click **Finish**. + +#. Configure the client parameters of the Flume role. + + a. Run the following commands to copy the generated client certificate (**flume_cChat.jks**) and client trust list (**flume_cChatt.jks**) to the client directory, for example, **/opt/flume-client/fusionInsight-flume-1.9.0/conf/**. (The Flume client must have been installed.) **10.196.26.1** is the service plane IP address of the node where the client resides. + + **scp ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/flume_cChat.jks user@10.196.26.1:/opt/flume-client/fusionInsight-flume-1.9.0/conf/** + + **scp ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/flume_cChatt.jks user@10.196.26.1:/opt/flume-client/fusionInsight-flume-1.9.0/conf/** + + .. note:: + + When copying the client certificate, you need to enter the password of user **user** of the host (for example, **10.196.26.1**) where the client resides. + + b. Log in to the node where the Flume client is decompressed as user **user**. Run the following command to go to the client directory **/opt/flume-client/fusionInsight-flume-1.9.0/bin**. + + **cd** opt/flume-client/fusionInsight-flume-1.9.0/bin + + c. .. _mrs_01_1070__en-us_topic_0000001219350773_lf5cdb5eca44842caac47a27a09a4e206: + + Run the following command to generate and obtain Flume client keystore password, trust list password, and keystore-password encrypted private key information. Enter the password twice and confirm the password. The password is the same as the password of the certificate whose alias is *flumechatclient* and the password of the *flume_cChat.jks* certificate library, for example *cNetty12@*. + + **./genPwFile.sh** + + **cat password.property** + + .. code-block:: + + password=4FD2491A40CAA5E0D03C2D03D97CBA3F + + .. note:: + + If the following error message is displayed, run the export **JAVA_HOME=\ JDKpath** command. + + .. code-block:: + + JAVA_HOME is null in current user,please install the JDK and set the JAVA_HOME + + d. Use the Flume configuration tool on FusionInsight Manager to configure the Flume role client parameters and generate a configuration file. + + #. Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool**. + + #. Set **Agent Name** to **client**. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them. + + Use SpoolDir Source, File Channel, and Avro Sink, as shown in :ref:`Figure 2 `. + + .. _mrs_01_1070__en-us_topic_0000001219350773_f736dda46e68742568d523a52754a5fde: + + .. figure:: /_static/images/en-us_image_0000001296059712.png + :alt: **Figure 2** Example for the Flume configuration tool + + **Figure 2** Example for the Flume configuration tool + + #. Double-click the source, channel, and sink. Set corresponding configuration parameters by seeing :ref:`Table 2 ` based on the actual environment. + + .. note:: + + - If the client parameters of the Flume role have been configured, you can obtain the existing client parameter configuration file from *client installation directory*\ **/fusioninsight-flume-1.9.0/conf/properties.properties** to ensure that the configuration is in concordance with the previous. Log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool** > **Import**, import the file, and modify the configuration items related to encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + + #. Click **Export** to save the **properties.properties** configuration file to the local. + + .. _mrs_01_1070__en-us_topic_0000001219350773_t4e49dd595a71448eb33a418332772306: + + .. table:: **Table 2** Parameters to be modified of the Flume role client + + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+=========================================================================================================================================================================================================================================================================================================================================================================================================================================================+===================================================================+ + | Name | The value must be unique and cannot be left blank. | test | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | spoolDir | Specifies the directory where the file to be collected resides. This parameter cannot be left blank. The directory needs to exist and have the write, read, and execute permissions on the flume running user. | /srv/BigData/hadoop/data1/zb | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | trackerDir | Specifies the path for storing the metadata of files collected by Flume. | /srv/BigData/hadoop/data1/tracker | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | batch-size | Specifies the number of events that Flume sends in a batch. | 61200 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | dataDirs | Specifies the directory for storing buffer data. The run directory is used by default. Configuring multiple directories on disks can improve transmission efficiency. Use commas (,) to separate multiple directories. If the directory is inside the cluster, the **/srv/BigData/hadoop/dataX/flume/data** directory can be used. **dataX** ranges from data1 to dataN. If the directory is outside the cluster, it needs to be independently planned. | /srv/BigData/hadoop/data1/flume/data | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | checkpointDir | Specifies the directory for storing the checkpoint information, which is under the run directory by default. If the directory is inside the cluster, the **/srv/BigData/hadoop/dataX/flume/checkpoint** directory can be used. **dataX** ranges from data1 to dataN. If the directory is outside the cluster, it needs to be independently planned. | /srv/BigData/hadoop/data1/flume/checkpoint | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | transactionCapacity | Specifies the transaction size, that is, the number of events in a transaction that can be processed by the current Channel. The size cannot be smaller than the batchSize of Source. Setting the same size as batchSize is recommended. | 61200 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | hostname | Specifies the name or IP address of the host whose data is to be sent. This parameter cannot be left blank. Name or IP address must be configured to be the name or IP address that the Avro source associated with it. | 192.168.108.11 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | port | Specifies the IP address to which Avro Sink is bound. This parameter cannot be left blank. It must be consistent with the port that is monitored by the connected Avro Source. | 21154 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | ssl | Specifies whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | Only Sources of the Avro type have this configuration item. | | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | keystore | Specifies the **flume_cChat.jks** certificate generated on the server. | /opt/flume-client/fusionInsight-flume-1.9.0/conf/flume_cChat.jks | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | keystore-password | Specifies the password of the key library, which is the password required to obtain the keystore information. | D03C2D03D97CBA3F4FD2491A40CAA5E0 | + | | | | + | | Enter the value of **password** obtained in :ref:`4.c `. | | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | truststore | Indicates the SSL certificate trust list of the server. | /opt/flume-client/fusionInsight-flume-1.9.0/conf/flume_cChatt.jks | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + | truststore-password | Specifies the trust list password, which is the password required to obtain the truststore information. | D03C2D03D97CBA3F4FD2491A40CAA5E0 | + | | | | + | | Enter the value of **password** obtained in :ref:`4.c `. | | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------+ + + e. Upload the **properties.properties** file to **flume/conf/** under the installation directory of the Flume client. + +#. Generate the certificate and trust list of the server and client of the MonitorServer role respectively. + + a. Log in to the host with the MonitorServer role assigned as user **omm**. + + Go to the **${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin** directory. + + **cd ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/bin** + + b. Run the following command to generate and export the server and client certificates of the MonitorServer role: + + **sh geneJKS.sh -m sKitty12@ -n cKitty12@** + + The generated certificate is saved in the **${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf** path. Where: + + - **ms_sChat.jks** is the certificate library of the MonitorServer role server. **ms_sChat.crt** is the exported file of the **ms_sChat.jks** certificate. **-m** indicates the password of the certificate and certificate library. + - **ms_cChat.jks** is the certificate library of the MonitorServer role client. **ms_cChat.crt** is the exported file of the **ms_cChat.jks** certificate. **-n** indicates the password of the certificate and certificate library. + - **ms_sChatt.jks** and **ms_cChatt.jks** are the SSL certificate trust lists of the MonitorServer server and client, respectively. + +#. Set the server parameters of the MonitorServer role. + + a. .. _mrs_01_1070__en-us_topic_0000001219350773_la6ea6d1571ea4b2a94b3c942a18144db: + + Run the following command to generate and obtain MonitorServer server keystore password, trust list password, and keystore-password encrypted private key information. Enter the password twice and confirm the password. The password is the same as the password of the certificate whose alias is *mschatserver* and the password of the *ms_sChat.jks* certificate library, for example *sKitty12@*. + + **./genPwFile.sh** + + **cat password.property** + + .. code-block:: + + password=AA5E0D03C2D4FD24CBA3F91A40C03D97 + + b. Run the following command to open the ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/service/application.properties file: Modify related parameters based on the description in :ref:`Table 3 `, save the modification, and exit. + + **vi ${BIGDATA_HOME}/FusionInsight_Porter\_**\ 8.1.2.2\ **/install/FusionInsight-Flume-1.9.0/flume/conf/service/application.properties** + + .. _mrs_01_1070__en-us_topic_0000001219350773_tc32e0ef5ae504791afb953e98354efa7: + + .. table:: **Table 3** Parameters to be modified of the MonitorServer role server + + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=====================================+===========================================================================================================================================================================+==========================================================================================================+ + | ssl_need_kspasswd_decrypt_key | Indicates whether to enable the user-defined key encryption and decryption function. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_server_enable | Indicates whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_server_key_store | Set this parameter based on the specific storage location. | ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_sChat.jks | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_server_trust_key_store | Set this parameter based on the specific storage location. | ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_sChatt.jks | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_server_key_store_password | Indicates the client certificate password. Set this parameter based on the actual situation of certificate creation (the plaintext key used to generate the certificate). | AA5E0D03C2D4FD24CBA3F91A40C03D97 | + | | | | + | | Enter the value of **password** obtained in :ref:`6.a `. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_server_trust_key_store_password | Indicates the client trust list password. Set this parameter based on the actual situation of certificate creation (the plaintext key used to generate the trust list). | AA5E0D03C2D4FD24CBA3F91A40C03D97 | + | | | | + | | Enter the value of **password** obtained in :ref:`6.a `. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_need_client_auth | Indicates whether to enable the client authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + + c. Restart the MonitorServer instance. Choose **Cluster >** *Name of the desired cluster* **> Services > Flume > Instance > MonitorServer**, select the configured MonitorServer instance, and choose **More > Restart Instance**. Enter the system administrator password and click **OK**. After the restart is complete, click **Finish**. + +#. Set the client parameters of the MonitorServer role. + + a. Run the following commands to copy the generated client certificate (**ms_cChat.jks**) and client trust list (**ms_cChatt.jks**) to the **/opt/flume-client/fusionInsight-flume-1.9.0/conf/** client directory. **10.196.26.1** is the service plane IP address of the node where the client resides. + + **scp ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_cChat.jks user@10.196.26.1:/opt/flume-client/fusionInsight-flume-1.9.0/conf/** + + **scp ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_cChatt.jks user@10.196.26.1:/opt/flume-client/fusionInsight-flume-1.9.0/conf/** + + b. Log in to the node where the Flume client is located as user **user**. Run the following command to go to the client directory **/opt/flume-client/fusionInsight-flume-1.9.0/bin**. + + **cd** /opt/flume-client/fusionInsight-flume-1.9.0/bin + + c. .. _mrs_01_1070__en-us_topic_0000001219350773_l6c040d3a99c04a7d87c53e59bafe8394: + + Run the following command to generate and obtain MonitorServer client keystore password, trust list password, and keystore-password encrypted private key information. Enter the password twice and confirm the password. The password is the same as the password of the certificate whose alias is *mschatclient* and the password of the *ms_cChat.jks* certificate library, for example *cKitty12@*. + + **./genPwFile.sh** + + **cat password.property** + + .. code-block:: + + password=BA3F91A40C03D97AA5E0D03C2D4FD24C + + d. Run the following command to open the **/opt/flume-client/fusionInsight-flume-1.9.0/conf/service/application.properties** file. (**/opt/flume-client/fusionInsight-flume-1.9.0** is the directory where the client is installed.) Modify related parameters based on the description in :ref:`Table 4 `, save the modification, and exit. + + **vi** **/opt/flume-client/fusionInsight-flume-1.9.0/conf/service/application.properties** + + .. _mrs_01_1070__en-us_topic_0000001219350773_ta0130cca376a4aaf833fa310a2e59e9d: + + .. table:: **Table 4** Parameters to be modified of the MonitorServer role client + + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=====================================+=====================================================================================================================================================================+==========================================================================================================+ + | ssl_need_kspasswd_decrypt_key | Indicates whether to enable the user-defined key encryption and decryption function. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_client_enable | Indicates whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_client_key_store | Set this parameter based on the specific storage location. | ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_cChat.jks | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_client_trust_key_store | Set this parameter based on the specific storage location. | ${BIGDATA_HOME}/FusionInsight_Porter\_8.1.2.2/install/FusionInsight-Flume-1.9.0/flume/conf/ms_cChatt.jks | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_client_key_store_password | Specifies the keystore password. Set this parameter based on the actual situation of certificate creation (the plaintext key used to generate the certificate). | BA3F91A40C03D97AA5E0D03C2D4FD24C | + | | | | + | | Enter the value of **password** obtained in :ref:`7.c `. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_client_trust_key_store_password | Specifies the trustkeystore password. Set this parameter based on the actual situation of certificate creation (the plaintext key used to generate the trust list). | BA3F91A40C03D97AA5E0D03C2D4FD24C | + | | | | + | | Enter the value of **password** obtained in :ref:`7.c `. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | ssl_need_client_auth | Indicates whether to enable the client authentication. (You are advised to enable this function to ensure security.) | true | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + +8. Verify log transmission. + + a. Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster >** *Name of the desired cluster* > **Services** > **HDFS**, click the HDFS WebUI link to go to the HDFS WebUI, and choose **Utilities > Browse the file system**. + + b. Check whether the data is generated in the **/flume/test** directory on the HDFS. + + + .. figure:: /_static/images/en-us_image_0000001295899872.png + :alt: **Figure 3** Checking HDFS directories and files + + **Figure 3** Checking HDFS directories and files diff --git a/doc/component-operation-guide-lts/source/using_flume/flume_client_cgroup_usage_guide.rst b/doc/component-operation-guide-lts/source/using_flume/flume_client_cgroup_usage_guide.rst new file mode 100644 index 0000000..bc85767 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/flume_client_cgroup_usage_guide.rst @@ -0,0 +1,49 @@ +:original_name: mrs_01_1082.html + +.. _mrs_01_1082: + +Flume Client Cgroup Usage Guide +=============================== + +Scenario +-------- + +This section describes how to join and log out of a cgroup, query the cgroup status, and change the cgroup CPU threshold. + +Procedure +--------- + +- **Join Cgroup** + + Assume that the Flume client installation path is **/opt/FlumeClient**, and the cgroup CPU threshold is 50%. Run the following command to join a cgroup: + + **cd /opt/FlumeClient/fusioninsight-flume-1.9.0/bin** + + **./flume-manage.sh cgroup join 50** + + .. note:: + + - This command can be used to join a cgroup and change the cgroup CPU threshold. + - The value of the CPU threshold of a cgroup ranges from 1 to 100 x *N*. *N* indicates the number of CPU cores. + +- **Check Cgroup status** + + Assume that the Flume client installation path is **/opt/FlumeClient**. Run the following commands to query the cgroup status: + + **cd /opt/FlumeClient/fusioninsight-flume-1.9.0/bin** + + **./flume-manage.sh cgroup status** + +- **Exit Cgroup** + + Assume that the Flume client installation path is **/opt/FlumeClient**. Run the following commands to exit cgroup: + + **cd /opt/FlumeClient/fusioninsight-flume-1.9.0/bin** + + **./flume-manage.sh cgroup exit** + + .. note:: + + - After the client is installed, the default cgroup is automatically created. If the **-s** parameter is not configured during client installation, the default value **-1** is used. The default value indicates that the agent process is not restricted by the CPU usage. + - Joining or exiting a cgroup does not affect the agent process. Even if the agent process is not started, the joining or exiting operation can be performed successfully, and the operation will take effect after the next startup of the agent process. + - After the client is uninstalled, the cgroups created during the client installation are automatically deleted. diff --git a/doc/component-operation-guide-lts/source/using_flume/flume_configuration_parameter_description.rst b/doc/component-operation-guide-lts/source/using_flume/flume_configuration_parameter_description.rst new file mode 100644 index 0000000..78efabb --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/flume_configuration_parameter_description.rst @@ -0,0 +1,510 @@ +:original_name: mrs_01_0396.html + +.. _mrs_01_0396: + +Flume Configuration Parameter Description +========================================= + +Some parameters can be configured on Manager. + +Overview +-------- + +This section describes how to configure the sources, channels, and sinks of Flume, and modify the configuration items of each module. + +Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Flume**. On the displayed page, click the **Configuration Tool** tab to configure the **source**, **channel**, and **sink** parameters. Parameters such as **channels** and **type** are configured only in the client configuration file **properties.properties**, the path of which is *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume version*\ **/conf/properties.properties**. + +.. note:: + + You must input encrypted information for some configurations. For details on how to encrypt information, see :ref:`Using the Encryption Tool of the Flume Client `. + +Common Source Configurations +---------------------------- + +- **Avro Source** + + An Avro source listens to the Avro port, receives data from the external Avro client, and places data into configured channels. :ref:`Table 1 ` lists common configurations. + + .. _mrs_01_0396__en-us_topic_0000001173949378_tb4f2cc56cf3945f7bba26f90f1afaa79: + + .. table:: **Table 1** Common configurations of an Avro source + + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=======================+=======================+=======================================================================================================================================================================================+ + | channels | **-** | Specifies the channel connected to the source. Multiple channels can be configured. Use spaces to separate them. | + | | | | + | | | In a single proxy process, sources and sinks are connected through channels. A source instance corresponds to multiple channels, but a sink instance corresponds only to one channel. | + | | | | + | | | The format is as follows: | + | | | | + | | | **.sources..channels = ...** | + | | | | + | | | **.sinks..channels = ** | + | | | | + | | | This parameter can be configured only in the **properties.properties** file. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | avro | Specifies the type, which is set to **avro**. The type of each source is a fixed value. | + | | | | + | | | This parameter can be configured only in the **properties.properties** file. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | bind | ``-`` | Specifies the host name or IP address associated with the source. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | port | ``-`` | Specifies the bound port number. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ssl | false | Specifies whether to use SSL encryption. | + | | | | + | | | - true | + | | | - false | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore-type | JKS | Specifies the Java trust store type. Set this parameter to **JKS** or other truststore types supported by Java. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore | ``-`` | Specifies the Java trust store file. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore-password | ``-`` | Specifies the Java trust store password. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore-type | JKS | Specifies the key storage type. Set this parameter to **JKS** or other truststore types supported by Java. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore | ``-`` | Specifies the key storage file. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore-password | ``-`` | Specifies the key storage password. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **SpoolDir Source** + + A SpoolDir source monitors and transmits new files that have been added to directories in quasi-real-time mode. Common configurations are as follows: + + .. table:: **Table 2** Common configurations of a SpoolDir source + + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +============================+=======================+==========================================================================================================================================================================================================================================================+ + | channels | ``-`` | Specifies the channel connected to the source. Multiple channels can be configured. | + | | | | + | | | This parameter can be configured only in the **properties.properties** file. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | spooldir | Type, which is set to **spooldir**. | + | | | | + | | | This parameter can be configured only in the **properties.properties** file. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the source is restarted. Unit: second | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spoolDir | ``-`` | Specifies the monitoring directory. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | fileSuffix | .COMPLETED | Specifies the suffix added after file transmission is complete. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | deletePolicy | never | Specifies the source file deletion policy after file transmission is complete. The value can be either **never** or **immediate**. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ignorePattern | ^$ | Specifies the regular expression of a file to be ignored. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | trackerDir | .flumespool | Specifies the metadata storage path during data transmission. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | 1000 | Specifies the source transmission granularity. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | decodeErrorPolicy | FAIL | Specifies the code error policy. This parameter can be configured only in the **properties.properties** file. | + | | | | + | | | The value can be **FAIL**, **REPLACE**, or **IGNORE**. | + | | | | + | | | **FAIL**: Generate an exception and fail the parsing. | + | | | | + | | | **REPLACE**: Replace the characters that cannot be identified with other characters, such as U+FFFD. | + | | | | + | | | **IGNORE**: Discard character strings that cannot be parsed. | + | | | | + | | | .. note:: | + | | | | + | | | If a code error occurs in the file, set **decodeErrorPolicy** to **REPLACE** or **IGNORE**. Flume will skip the code error and continue to collect subsequent logs. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | deserializer | LINE | Specifies the file parser. The value can be either **LINE** or **BufferedLine**. | + | | | | + | | | - When the value is set to **LINE**, characters read from the file are transcoded one by one. | + | | | - When the value is set to **BufferedLine**, one line or multiple lines of characters read from the file are transcoded in batches, which delivers better performance. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | deserializer.maxLineLength | 2048 | Specifies the maximum length for resolution by line, ranging from 0 to 2,147,483,647. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | deserializer.maxBatchLine | 1 | Specifies the maximum number of lines for resolution by line. If multiple lines are set, **maxLineLength** must be set to a corresponding multiplier. For example, if **maxBatchLine** is set to **2**, **maxLineLength** is set to **4096** (2048 x 2). | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | selector.type | replicating | Specifies the selector type. The value can be either **replicating** or **multiplexing**. | + | | | | + | | | - **replicating** indicates that the same content is sent to each channel. | + | | | - **multiplexing** indicates that the content is sent only to certain channels according to the distribution rule. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | interceptors | ``-`` | Specifies the interceptor. For details, see the `Flume official document `__. | + | | | | + | | | This parameter can be configured only in the **properties.properties** file. | + +----------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. note:: + + The Spooling source ignores the last line feed character of each event when data is read by line. Therefore, Flume does not calculate the data volume counters used by the last line feed character. + +- **Kafka Source** + + A Kafka source consumes data from Kafka topics. Multiple sources can consume data of the same topic, and the sources consume different partitions of the topic. Common configurations are as follows: + + .. table:: **Table 3** Common configurations of a Kafka source + + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=================================+===========================================+====================================================================================================================================================================================+ + | channels | ``-`` | Specifies the channel connected to the source. Multiple channels can be configured. | + | | | | + | | | This parameter can be configured only in the **properties.properties** file. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | org.apache.flume.source.kafka.KafkaSource | Specifies the type, which is set to **org.apache.flume.source.kafka.KafkaSource**. | + | | | | + | | | This parameter can be configured only in the **properties.properties** file. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the source is restarted. Unit: second | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | nodatatime | 0 (Disabled) | Specifies the alarm threshold. An alarm is triggered when the duration that Kafka does not release data to subscribers exceeds the threshold. Unit: second | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | 1000 | Specifies the number of events written into a channel at a time. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchDurationMillis | 1000 | Specifies the maximum duration of topic data consumption at a time, expressed in milliseconds. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keepTopicInHeader | false | Specifies whether to save topics in the event header. If topics are saved, topics configured in Kafka sinks become invalid. | + | | | | + | | | - true | + | | | - false | + | | | | + | | | This parameter can be configured only in the **properties.properties** file. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keepPartitionInHeader | false | Specifies whether to save partition IDs in the event header. If partition IDs are saved, Kafka sinks write data to the corresponding partitions. | + | | | | + | | | - true | + | | | - false | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.bootstrap.servers | ``-`` | Specifies the list of Broker addresses, which are separated by commas. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.consumer.group.id | ``-`` | Specifies the Kafka consumer group ID. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.topics | ``-`` | Specifies the list of subscribed Kafka topics, which are separated by commas (,). | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.topics.regex | ``-`` | Specifies the subscribed topics that comply with regular expressions. **kafka.topics.regex** has a higher priority than **kafka.topics** and will overwrite **kafka.topics**. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.security.protocol | SASL_PLAINTEXT | Specifies the security protocol of Kafka. The value must be set to **PLAINTEXT** for clusters in which Kerberos authentication is disabled. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.kerberos.domain.name | ``-`` | Specifies the value of **default_realm** of Kerberos in the Kafka cluster, which should be configured only for security clusters. | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Other Kafka Consumer Properties | ``-`` | Specifies other Kafka configurations. This parameter can be set to any consumption configuration supported by Kafka, and the **.kafka** prefix must be added to the configuration. | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +---------------------------------+-------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Taildir Source** + + A Taildir source monitors file changes in a directory and automatically reads the file content. In addition, it can transmit data in real time. :ref:`Table 4 ` lists common configurations. + + .. _mrs_01_0396__en-us_topic_0000001173949378_t2c85090722c4451682fad2657a7bdc35: + + .. table:: **Table 4** Common configurations of a Taildir source + + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +========================================+=======================+==============================================================================================================================================================================================================================================================+ + | channels | ``-`` | Specifies the channel connected to the source. Multiple channels can be configured. | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | taildir | Specifies the type, which is set to **taildir**. | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | filegroups | ``-`` | Specifies the group name of a collection file directory. Group names are separated by spaces. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | filegroups..parentDir | ``-`` | Specifies the parent directory. The value must be an absolute path. | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | filegroups..filePattern | ``-`` | Specifies the relative file path of the file group's parent directory. Directories can be included and regular expressions are supported. It must be used together with **parentDir**. | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | positionFile | ``-`` | Specifies the metadata storage path during data transmission. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | headers.. | ``-`` | Specifies the key-value of an event when data of a group is being collected. | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | byteOffsetHeader | false | Specifies whether each event header should contain the location information about the event in the source file. The location information is saved in the **byteoffset** variable. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | skipToEnd | false | Specifies whether Flume can locate the latest location of a file and read the latest data after restart. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | idleTimeout | 120000 | Specifies the idle duration during file reading, expressed in milliseconds. If the file data is not changed in this idle period, the source closes the file. If data is written into this file after it is closed, the source opens the file and reads data. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | writePosInterval | 3000 | Specifies the interval for writing metadata to a file, expressed in milliseconds. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | 1000 | Specifies the number of events written to the channel in batches. | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the source is restarted. Unit: second | + +----------------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Http Source** + + An HTTP source receives data from an external HTTP client and sends the data to the configured channels. :ref:`Table 5 ` lists common configurations. + + .. _mrs_01_0396__en-us_topic_0000001173949378_t033eef1276424185b1cfd10a7d4e024f: + + .. table:: **Table 5** Common configurations of an HTTP source + + +-----------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=======================+==========================================+=======================================================================================================================================================+ + | channels | ``-`` | Specifies the channel connected to the source. Multiple channels can be configured. This parameter can be set only in the properties.properties file. | + +-----------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | http | Specifies the type, which is set to **http**. This parameter can be set only in the properties.properties file. | + +-----------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + | bind | ``-`` | Specifies the name or IP address of the bound host. | + +-----------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + | port | ``-`` | Specifies the bound port. | + +-----------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + | handler | org.apache.flume.source.http.JSONHandler | Specifies the message parsing method of an HTTP request. The following methods are supported: | + | | | | + | | | - **org.apache.flume.source.http.JSONHandler**: JSON | + | | | - **org.apache.flume.sink.solr.morphline.BlobHandler**: BLOB | + +-----------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + | handler.\* | ``-`` | Specifies handler parameters. | + +-----------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + | enableSSL | false | Specifies whether SSL is enabled in HTTP. | + +-----------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore | ``-`` | Specifies the keystore path set after SSL is enabled in HTTP. | + +-----------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystorePassword | ``-`` | Specifies the keystore password set after SSL is enabled in HTTP. | + +-----------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Common Channel Configurations +----------------------------- + +- **Memory Channel** + + A memory channel uses memory as the cache. Events are stored in memory queues. :ref:`Table 6 ` lists common configurations. + + .. _mrs_01_0396__en-us_topic_0000001173949378_tc1421df5bc6c415ca490e671ea935f85: + + .. table:: **Table 6** Common configurations of a memory channel + + +---------------------+---------------+-------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=====================+===============+===================================================================================================================+ + | type | ``-`` | Specifies the type, which is set to **memory**. This parameter can be set only in the properties.properties file. | + +---------------------+---------------+-------------------------------------------------------------------------------------------------------------------+ + | capacity | 10000 | Specifies the maximum number of events cached in a channel. | + +---------------------+---------------+-------------------------------------------------------------------------------------------------------------------+ + | transactionCapacity | 1000 | Specifies the maximum number of events accessed each time. | + +---------------------+---------------+-------------------------------------------------------------------------------------------------------------------+ + | channelfullcount | 10 | Specifies the channel full count. When the count reaches the threshold, an alarm is reported. | + +---------------------+---------------+-------------------------------------------------------------------------------------------------------------------+ + +- **File Channel** + + A file channel uses local disks as the cache. Events are stored in the folder specified by **dataDirs**. :ref:`Table 7 ` lists common configurations. + + .. _mrs_01_0396__en-us_topic_0000001173949378_td180d6190e86420d8779010b90877938: + + .. table:: **Table 7** Common configurations of a file channel + + +----------------------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +======================+=======================================+=================================================================================================================================================+ + | type | ``-`` | Specifies the type, which is set to **file**. This parameter can be set only in the properties.properties file. | + +----------------------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | checkpointDir | ${BIGDATA_DATA_HOME}/flume/checkpoint | Specifies the checkpoint storage directory. | + +----------------------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | dataDirs | ${BIGDATA_DATA_HOME}/flume/data | Specifies the data cache directory. Multiple directories can be configured to improve performance. The directories are separated by commas (,). | + +----------------------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | maxFileSize | 2146435071 | Specifies the maximum size of a single cache file, expressed in bytes. | + +----------------------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | minimumRequiredSpace | 524288000 | Specifies the minimum idle space in the cache, expressed in bytes. | + +----------------------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | capacity | 1000000 | Specifies the maximum number of events cached in a channel. | + +----------------------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | transactionCapacity | 10000 | Specifies the maximum number of events accessed each time. | + +----------------------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | channelfullcount | 10 | Specifies the channel full count. When the count reaches the threshold, an alarm is reported. | + +----------------------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Kafka Channel** + + A Kafka channel uses a Kafka cluster as the cache. Kafka provides high availability and multiple copies to prevent data from being immediately consumed by sinks when Flume or Kafka Broker crashes. :ref:`Table 10 Common configurations of a Kafka channel ` lists common configurations. + + .. _mrs_01_0396__en-us_topic_0000001173949378_ta58e4ea5e98446418e498b81cf0c75b7: + + .. table:: **Table 8** Common configurations of a Kafka channel + + +----------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +==================================+=======================+=================================================================================================================+ + | type | ``-`` | Specifies the type, which is set to **org.apache.flume.channel.kafka.KafkaChannel**. | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +----------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+ + | kafka.bootstrap.servers | ``-`` | Specifies the list of Brokers in the Kafka cluster. | + +----------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+ + | kafka.topic | flume-channel | Specifies the Kafka topic used by the channel to cache data. | + +----------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+ + | kafka.consumer.group.id | flume | Specifies the Kafka consumer group ID. | + +----------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+ + | parseAsFlumeEvent | true | Specifies whether data is parsed into Flume events. | + +----------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+ + | migrateZookeeperOffsets | true | Specifies whether to search for offsets in ZooKeeper and submit them to Kafka when there is no offset in Kafka. | + +----------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+ + | kafka.consumer.auto.offset.reset | latest | Consumes data from the specified location when there is no offset. | + +----------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+ + | kafka.producer.security.protocol | SASL_PLAINTEXT | Specifies the Kafka producer security protocol. | + +----------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+ + | kafka.consumer.security.protocol | SASL_PLAINTEXT | Specifies the Kafka consumer security protocol. | + +----------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+ + +Common Sink Configurations +-------------------------- + +- **HDFS Sink** + + An HDFS sink writes data into HDFS. :ref:`Table 9 ` lists common configurations. + + .. _mrs_01_0396__en-us_topic_0000001173949378_t3f4509459f734167afdd0cb20857d2ef: + + .. table:: **Table 9** Common configurations of an HDFS sink + + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +==========================+=======================+=====================================================================================================================================================================================================================================================+ + | channel | **-** | Specifies the channel connected to the sink. This parameter can be set only in the properties.properties file. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | hdfs | Specifies the type, which is set to **hdfs**. This parameter can be set only in the properties.properties file. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the sink is restarted. Unit: second | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.path | ``-`` | Specifies the HDFS path. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.inUseSuffix | .tmp | Specifies the suffix of the HDFS file to which data is being written. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.rollInterval | 30 | Specifies the interval for file rolling, expressed in seconds. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.rollSize | 1024 | Specifies the size for file rolling, expressed in bytes. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.rollCount | 10 | Specifies the number of events for file rolling. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.idleTimeout | 0 | Specifies the timeout interval for closing idle files automatically, expressed in seconds. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.batchSize | 1000 | Specifies the number of events written into HDFS at a time. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosPrincipal | ``-`` | Specifies the Kerberos username for HDFS authentication. This parameter is not required for a cluster in which Kerberos authentication is disabled. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosKeytab | ``-`` | Specifies the Kerberos keytab of HDFS authentication. This parameter is not required for a cluster in which Kerberos authentication is disabled. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.fileCloseByEndEvent | true | Specifies whether to close the file when the last event is received. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.batchCallTimeout | ``-`` | Specifies the timeout control duration each time events are written into HDFS, expressed in milliseconds. | + | | | | + | | | If this parameter is not specified, the timeout duration is controlled when each event is written into HDFS. When the value of **hdfs.batchSize** is greater than 0, configure this parameter to improve the performance of writing data into HDFS. | + | | | | + | | | .. note:: | + | | | | + | | | The value of **hdfs.batchCallTimeout** depends on **hdfs.batchSize**. A greater **hdfs.batchSize** requires a larger **hdfs.batchCallTimeout**. If the value of **hdfs.batchCallTimeout** is too small, writing events to HDFS may fail. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | serializer.appendNewline | true | Specifies whether to add a line feed character (**\\n**) after an event is written to HDFS. If a line feed character is added, the data volume counters used by the line feed character will not be calculated by HDFS sinks. | + +--------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Avro Sink** + + An Avro sink converts events into Avro events and sends them to the monitoring ports of the hosts. :ref:`Table 10 ` lists common configurations. + + .. _mrs_01_0396__en-us_topic_0000001173949378_tcf9863ee677d41a6882b71987541fa33: + + .. table:: **Table 10** Common configurations of an Avro sink + + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=====================+===============+=================================================================================================================+ + | channel | **-** | Specifies the channel connected to the sink. This parameter can be set only in the properties.properties file. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | type | ``-`` | Specifies the type, which is set to **avro**. This parameter can be set only in the properties.properties file. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | hostname | ``-`` | Specifies the name or IP address of the bound host. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | port | ``-`` | Specifies the monitoring port. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | batch-size | 1000 | Specifies the number of events sent in a batch. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | ssl | false | Specifies whether to use SSL encryption. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | truststore-type | JKS | Specifies the Java trust store type. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | truststore | ``-`` | Specifies the Java trust store file. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | truststore-password | ``-`` | Specifies the Java trust store password. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | keystore-type | JKS | Specifies the key storage type. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | keystore | ``-`` | Specifies the key storage file. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + | keystore-password | ``-`` | Specifies the key storage password. | + +---------------------+---------------+-----------------------------------------------------------------------------------------------------------------+ + +- **HBase Sink** + + An HBase sink writes data into HBase. :ref:`Table 11 ` lists common configurations. + + .. _mrs_01_0396__en-us_topic_0000001173949378_tf429beac69444e93a744abfe1d0fb744: + + .. table:: **Table 11** Common configurations of an HBase sink + + +-------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +===================+===============+======================================================================================================================================================+ + | channel | **-** | Specifies the channel connected to the sink. This parameter can be set only in the properties.properties file. | + +-------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | ``-`` | Specifies the type, which is set to **hbase**. This parameter can be set only in the properties.properties file. | + +-------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + | table | ``-`` | Specifies the HBase table name. | + +-------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the sink is restarted. Unit: second | + +-------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + | columnFamily | ``-`` | Specifies the HBase column family. | + +-------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | 1000 | Specifies the number of events written into HBase at a time. | + +-------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kerberosPrincipal | ``-`` | Specifies the Kerberos username for HBase authentication. This parameter is not required for a cluster in which Kerberos authentication is disabled. | + +-------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kerberosKeytab | ``-`` | Specifies the Kerberos keytab of HBase authentication. This parameter is not required for a cluster in which Kerberos authentication is disabled. | + +-------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Kafka Sink** + + A Kafka sink writes data into Kafka. :ref:`Table 12 ` lists common configurations. + + .. _mrs_01_0396__en-us_topic_0000001173949378_tf898876f2a2f45629655554005c3f0a8: + + .. table:: **Table 12** Common configurations of a Kafka sink + + +---------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=================================+=======================+===================================================================================================================================================================================+ + | channel | **-** | Specifies the channel connected to the sink. This parameter can be set only in the properties.properties file. | + +---------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | ``-`` | Specifies the type, which is set to **org.apache.flume.sink.kafka.KafkaSink**. | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +---------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.bootstrap.servers | ``-`` | Specifies the list of Kafka Brokers, which are separated by commas. | + +---------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the sink is restarted. Unit: second | + +---------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.topic | default-flume-topic | Specifies the topic where data is written. | + +---------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | flumeBatchSize | 1000 | Specifies the number of events written into Kafka at a time. | + +---------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.security.protocol | SASL_PLAINTEXT | Specifies the security protocol of Kafka. The value must be set to **PLAINTEXT** for clusters in which Kerberos authentication is disabled. | + +---------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.kerberos.domain.name | ``-`` | Specifies the Kafka domain name. This parameter is mandatory for a security cluster. This parameter can be set only in the properties.properties file. | + +---------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Other Kafka Producer Properties | ``-`` | Specifies other Kafka configurations. This parameter can be set to any production configuration supported by Kafka, and the **.kafka** prefix must be added to the configuration. | + | | | | + | | | This parameter can be set only in the properties.properties file. | + +---------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flume/flume_service_configuration_guide.rst b/doc/component-operation-guide-lts/source/using_flume/flume_service_configuration_guide.rst new file mode 100644 index 0000000..ec1db6b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/flume_service_configuration_guide.rst @@ -0,0 +1,781 @@ +:original_name: mrs_01_1057.html + +.. _mrs_01_1057: + +Flume Service Configuration Guide +================================= + +This configuration guide describes how to configure common Flume services. For non-common Source, Channel, and Sink configuration, see the user manual provided by the Flume community. + +.. note:: + + - Parameters in bold in the following tables are mandatory. + - The value of **BatchSize** of the Sink must be less than that of **transactionCapacity** of the Channel. + - Only some parameters of Source, Channel, and Sink are displayed on the Flume configuration tool page. For details, see the following configurations. + - The Customer Source, Customer Channel, and Customer Sink displayed on the Flume configuration tool page need to be configured based on self-developed code. The following common configurations are not displayed. + +Common Source Configurations +---------------------------- + +- **Avro Source** + + An Avro source listens to the Avro port, receives data from the external Avro client, and places data into configured channels. Common configurations are as follows: + + .. table:: **Table 1** Common configurations of an Avro source + + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=======================+=======================+=====================================================================================================================================================================================================================================================+ + | channels | ``-`` | Specifies the channel connected to the source. Multiple channels can be configured. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | avro | Specifies the type of the avro source, which must be **avro**. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | bind | ``-`` | Specifies the listening host name/IP address. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | port | ``-`` | Specifies the bound listening port. Ensure that this port is not occupied. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | threads | ``-`` | Specifies the maximum number of source threads. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | compression-type | none | Specifies the message compression format, which can be set to **none** or **deflate**. **none** indicates that data is not compressed, while **deflate** indicates that data is compressed. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | compression-level | 6 | Specifies the data compression level, which ranges from **1** to **9**. The larger the value is, the higher the compression rate is. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ssl | false | Specifies whether to use SSL encryption. If this parameter is set to **true**, the values of **keystore** and **keystore-password** must be specified. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore-type | JKS | Specifies the Java trust store type, which can be set to **JKS** or **PKCS12**. | + | | | | + | | | .. note:: | + | | | | + | | | Different passwords are used to protect the key store and private key of **JKS**, while the same password is used to protect the key store and private key of **PKCS12**. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore | ``-`` | Specifies the Java trust store file. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore-password | ``-`` | Specifies the Java trust store password. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore-type | JKS | Specifies the keystore type set after SSL is enabled, which can be set to **JKS** or **PKCS12**. | + | | | | + | | | .. note:: | + | | | | + | | | Different passwords are used to protect the key store and private key of **JKS**, while the same password is used to protect the key store and private key of **PKCS12**. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore | ``-`` | Specifies the keystore file path set after SSL is enabled. This parameter is mandatory if SSL is enabled. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore-password | ``-`` | Specifies the keystore password set after SSL is enabled. This parameter is mandatory if SSL is enabled. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | trust-all-certs | false | Specifies whether to disable the check for the SSL server certificate. If this parameter is set to **true**, the SSL server certificate of the remote source is not checked. You are not advised to perform this operation during the production. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | exclude-protocols | SSLv3 | Specifies the excluded protocols. The entered protocols must be separated by spaces. The default value is **SSLv3**. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ipFilter | false | Specifies whether to enable the IP address filtering. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ipFilter.rules | ``-`` | Specifies the rules of *N* network **ipFilters**. Host names or IP addresses must be separated by commas (,). If this parameter is set to **true**, there are two configuration rules: allow and forbidden. The configuration format is as follows: | + | | | | + | | | ipFilterRules=allow:ip:127.*, allow:name:localhost, deny:ip:\* | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **SpoolDir Source** + + SpoolDir Source monitors and transmits new files that have been added to directories in real-time mode. Common configurations are as follows: + + .. table:: **Table 2** Common configurations of a Spooling Directory source + + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +============================+=======================+==============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | channels | ``-`` | Specifies the channel connected to the source. Multiple channels can be configured. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | spooldir | Specifies the type of the spooling source, which must be set to **spooldir**. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spoolDir | ``-`` | Specifies the monitoring directory of the Spooldir source. A Flume running user must have the read, write, and execution permissions on the directory. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the source is restarted. Unit: second | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | fileSuffix | .COMPLETED | Specifies the suffix added after file transmission is complete. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | deletePolicy | never | Specifies the source file deletion policy after file transmission is complete. The value can be either **never** or **immediate**. **never** indicates that the source file is not deleted after file transmission is complete, while **immediate** indicates that the source file is immediately deleted after file transmission is complete. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ignorePattern | ^$ | Specifies the regular expression of a file to be ignored. The default value is ^$, indicating that spaces are ignored. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | includePattern | ^.*$ | Specifies the regular expression that contains a file. This parameter can be used together with **ignorePattern**. If a file meets both **ignorePattern** and **includePattern**, the file is ignored. In addition, when a file starts with a period (.), the file will not be filtered. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | trackerDir | .flumespool | Specifies the metadata storage path during data transmission. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | 1000 | Specifies the number of events written to the channel in batches. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | decodeErrorPolicy | FAIL | Specifies the code error policy. | + | | | | + | | | .. note:: | + | | | | + | | | If a code error occurs in the file, set **decodeErrorPolicy** to **REPLACE** or **IGNORE**. Flume will skip the code error and continue to collect subsequent logs. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | deserializer | LINE | Specifies the file parser. The value can be either **LINE** or **BufferedLine**. | + | | | | + | | | - When the value is set to **LINE**, characters read from the file are transcoded one by one. | + | | | - When the value is set to **BufferedLine**, one line or multiple lines of characters read from the file are transcoded in batches, which delivers better performance. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | deserializer.maxLineLength | 2048 | Specifies the maximum length for resolution by line. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | deserializer.maxBatchLine | 1 | Specifies the maximum number of lines for resolution by line. If multiple lines are set, **maxLineLength** must be set to a corresponding multiplier. | + | | | | + | | | .. note:: | + | | | | + | | | When configuring the Interceptor, take the multi-line combination into consideration to avoid data loss. If the Interceptor cannot process combined lines, set this parameter to **1**. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | selector.type | replicating | Specifies the selector type. The value can be either **replicating** or **multiplexing**. **replicating** indicates that data is replicated and then transferred to each channel so that each channel receives the same data, while **multiplexing** indicates that a channel is selected based on the value of the header in the event and each channel has different data. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | interceptors | ``-`` | Specifies the interceptor. Multiple interceptors are separated by spaces. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | inputCharset | UTF-8 | Specifies the encoding format of a read file. The encoding format must be the same as that of the data source file that has been read. Otherwise, an error may occur during character parsing. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | fileHeader | false | Specifies whether to add the file name (including the file path) to the event header. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | fileHeaderKey | ``-`` | Specifies that the data storage structure in header is set in the mode. Parameters **fileHeaderKey** and **fileHeader** must be used together. Following is an example if **fileHeader** is set to true: | + | | | | + | | | Define **fileHeaderKey** as **file**. When the **/root/a.txt** file is read, **fileHeaderKey** exists in the header in the **file=/root/a.txt** format. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | basenameHeader | false | Specifies whether to add the file name (excluding the file path) to the event header. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | basenameHeaderKey | ``-`` | Specifies that the data storage structure in header is set in the mode. Parameters **basenameHeaderKey** and **basenameHeader** must be used together. Following is an example if **basenameHeader** is set to true: | + | | | | + | | | Define **basenameHeaderKey** as **file**. When the **a.txt** file is read, **basenameHeaderKey** exists in the header in the **file=a.txt** format. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | pollDelay | 500 | Specifies the delay for polling new files in the monitoring directory. Unit: milliseconds | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | recursiveDirectorySearch | false | Specifies whether to monitor new files in the subdirectory of the configured directory. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | consumeOrder | oldest | Specifies the consumption order of files in a directory. If this parameter is set to **oldest** or **youngest**, the sequence of files to be read is determined by the last modification time of files in the monitored directory. If there are a large number of files in the directory, it takes a long time to search for **oldest** or **youngest** files. If this parameter is set to **random**, an earlier created file may not be read for a long time. If this parameter is set to **oldest** or **youngest**, it takes a long time to find the latest and the earliest file. The options are as follows: **random**, **youngest**, and **oldest**. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | maxBackoff | 4000 | Specifies the maximum time to wait between consecutive attempts to write to a channel if the channel is full. If the time exceeds the threshold, an exception is thrown. The corresponding source starts to write at a smaller time value. Each time the source attempts, the digital exponent increases until the current specified value is reached. If data cannot be written, the data write fails. Unit: second | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | emptyFileEvent | true | Specifies whether to collect empty file information and send it to the sink end. The default value is **true**, indicating that empty file information is sent to the sink end. This parameter is valid only for HDFS Sink. Taking HDFS Sink as an example, if this parameter is set to **true** and an empty file exists in the **spoolDir** directory, an empty file with the same name will be created in the **hdfs.path** directory of HDFS. | + +----------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. note:: + + SpoolDir Source ignores the last line feed character of each event when data is reading by row. Therefore, Flume does not calculate the data volume counters used by the last line feed character. + +- **Kafka Source** + + A Kafka source consumes data from Kafka topics. Multiple sources can consume data of the same topic, and the sources consume different partitions of the topic. Common configurations are as follows: + + .. table:: **Table 3** Common configurations of a Kafka source + + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=================================+===========================================+==============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | channels | ``-`` | Specifies the channel connected to the source. Multiple channels can be configured. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | org.apache.flume.source.kafka.KafkaSource | Specifies the type of the Kafka source, which must be set to **org.apache.flume.source.kafka.KafkaSource**. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.bootstrap.servers | ``-`` | Specifies the bootstrap address port list of Kafka. If Kafka has been installed in the cluster and the configuration has been synchronized to the server, you do not need to set this parameter on the server. The default value is the list of all brokers in the Kafka cluster. This parameter must be configured on the client. Use commas (,) to separate multiple values of *IP address:Port number*. The rules for matching ports and security protocols must be as follows: port 21007 matches the security mode (SASL_PLAINTEXT), and port 9092 matches the common mode (PLAINTEXT). | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.topics | ``-`` | Specifies the list of subscribed Kafka topics, which are separated by commas (,). | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.topics.regex | ``-`` | Specifies the subscribed topics that comply with regular expressions. **kafka.topics.regex** has a higher priority than **kafka.topics** and will overwrite **kafka.topics**. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the source is restarted. Unit: second | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | nodatatime | 0 (Disabled) | Specifies the alarm threshold. An alarm is triggered when the duration that Kafka does not release data to subscribers exceeds the threshold. Unit: second This parameter can be configured in the **properties.properties** file. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | 1000 | Specifies the number of events written to the channel in batches. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchDurationMillis | 1000 | Specifies the maximum duration of topic data consumption at a time, expressed in milliseconds. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keepTopicInHeader | false | Specifies whether to save topics in the event header. If the parameter value is **true**, topics configured in Kafka Sink become invalid. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | setTopicHeader | true | If this parameter is set to **true**, the topic name defined in **topicHeader** is stored in the header. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | topicHeader | topic | When **setTopicHeader** is set to **true**, this parameter specifies the name of the topic received by the storage device. If the property is used with that of Kafka Sink **topicHeader**, be careful not to send messages to the same topic cyclically. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | useFlumeEventFormat | false | By default, an event is transferred from a Kafka topic to the body of the event in the form of bytes. If this parameter is set to **true**, the Avro binary format of Flume is used to read events. When used together with the **parseAsFlumeEvent** parameter with the same name in KafkaSink or KakfaChannel, any set **header** generated from the data source is retained. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keepPartitionInHeader | false | Specifies whether to save partition IDs in the event header. If the parameter value is **true**, Kafka Sink writes data to the corresponding partition. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.consumer.group.id | flume | Specifies the Kafka consumer group ID. Sources or proxies having the same ID are in the same consumer group. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.security.protocol | SASL_PLAINTEXT | Specifies the Kafka security protocol. The parameter value must be set to PLAINTEXT in a common cluster. The rules for matching ports and security protocols must be as follows: port 21007 matches the security mode (SASL_PLAINTEXT), and port 9092 matches the common mode (PLAINTEXT). | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Other Kafka Consumer Properties | ``-`` | Specifies other Kafka configurations. This parameter can be set to any consumption configuration supported by Kafka, and the **.kafka** prefix must be added to the configuration. | + +---------------------------------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Taildir Source** + + A Taildir source monitors file changes in a directory and automatically reads the file content. In addition, it can transmit data in real time. Common configurations are as follows: + + .. table:: **Table 4** Common configurations of a Taildir source + + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +========================================+=======================+==========================================================================================================================================================================================================================================================================================================================================================================================+ + | channels | ``-`` | Specifies the channel connected to the source. Multiple channels can be configured. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | TAILDIR | Specifies the type of the taildir source, which must be set to TAILDIR. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | filegroups | ``-`` | Specifies the group name of a collection file directory. Group names are separated by spaces. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | filegroups..parentDir | ``-`` | Specifies the parent directory. The value must be an absolute path. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | filegroups..filePattern | ``-`` | Specifies the relative file path of the file group's parent directory. Directories can be included and regular expressions are supported. It must be used together with **parentDir**. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | positionFile | ``-`` | Specifies the metadata storage path during data transmission. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | headers.. | ``-`` | Specifies the key-value of an event when data of a group is being collected. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | byteOffsetHeader | false | Specifies whether each event header contains the event location information in the source file. If the parameter value is true, the location information is saved in the byteoffset variable. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | maxBatchCount | Long.MAX_VALUE | Specifies the maximum number of batches that can be consecutively read from a file. If the monitored directory reads multiple files consecutively and one of the files is written at a rapid rate, other files may fail to be processed. This is because the file that is written at a high speed will be in an infinite read loop. In this case, set this parameter to a smaller value. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | skipToEnd | false | Specifies whether Flume can locate the latest location of a file and read the latest data after restart. If the parameter value is true, Flume locates and reads the latest file data after restart. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | idleTimeout | 120000 | Specifies the idle duration during file reading, expressed in milliseconds. If file content is not changed in the preset time duration, close the file. If data is written to this file after the file is closed, open the file and read data. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | writePosInterval | 3000 | Specifies the interval for writing metadata to a file, expressed in milliseconds. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | 1000 | Specifies the number of events written to the channel in batches. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the source is restarted. Unit: second | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | fileHeader | false | Specifies whether to add the file name (including the file path) to the event header. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | fileHeaderKey | file | Specifies that the data storage structure in header is set in the mode. Parameters **fileHeaderKey** and **fileHeader** must be used together. Following is an example if **fileHeader** is set to true: | + | | | | + | | | Define **fileHeaderKey** as **file**. When the **/root/a.txt** file is read, **fileHeaderKey** exists in the header in the **file=/root/a.txt** format. | + +----------------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Http Source** + + An HTTP source receives data from an external HTTP client and sends the data to the configured channels. Common configurations are as follows: + + .. table:: **Table 5** Common configurations of an HTTP source + + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=======================+==========================================+==================================================================================================================================================================================================+ + | channels | ``-`` | Specifies the channel connected to the source. Multiple channels can be configured. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | http | Specifies the type of the http source, which must be set to http. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | bind | ``-`` | Specifies the listening host name/IP address. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | port | ``-`` | Specifies the bound listening port. Ensure that this port is not occupied. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | handler | org.apache.flume.source.http.JSONHandler | Specifies the message parsing method of an HTTP request. Two formats are supported: JSON (org.apache.flume.source.http.JSONHandler) and BLOB (org.apache.flume.sink.solr.morphline.BlobHandler). | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | handler.\* | ``-`` | Specifies handler parameters. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | exclude-protocols | SSLv3 | Specifies the excluded protocols. The entered protocols must be separated by spaces. The default value is **SSLv3**. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | include-cipher-suites | ``-`` | Specifies the included protocols. The entered protocols must be separated by spaces. If this parameter is left empty, all protocols are supported by default. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | enableSSL | false | Specifies whether SSL is enabled in HTTP. If this parameter is set to **true**, the values of **keystore** and **keystore-password** must be specified. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore-type | JKS | Specifies the keystore type, which can be **JKS** or **PKCS12**. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore | ``-`` | Specifies the keystore path set after SSL is enabled in HTTP. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystorePassword | ``-`` | Specifies the keystore password set after SSL is enabled in HTTP. | + +-----------------------+------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Thrift Source** + + Thrift Source monitors the thrift port, receives data from the external Thrift clients, and puts the data into the configured channel. Common configurations are as follows: + + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=======================+=======================+=========================================================================================================================================================================================================================================================================================================================================+ + | channels | ``-`` | Specifies the channel connected to the source. Multiple channels can be configured. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | thrift | Specifies the type of the thrift source, which must be set to **thrift**. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | bind | ``-`` | Specifies the listening host name/IP address. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | port | ``-`` | Specifies the bound listening port. Ensure that this port is not occupied. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | threads | ``-`` | Specifies the maximum number of worker threads that can be run. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kerberos | false | Specifies whether Kerberos authentication is enabled. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | agent-keytab | ``-`` | Specifies the address of the keytab file used by the server. The machine-machine account must be used. You are advised to use **flume/conf/flume_server.keytab** in the Flume service installation directory. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | agent-principal | ``-`` | Specifies the principal of the security user used by the server. The principal must be a machine-machine account. You are advised to use the default user of Flume: flume_server/hadoop.\ **\ @\ ** | + | | | | + | | | .. note:: | + | | | | + | | | **flume_server/hadoop.**\ <*system domain name*> is the username. All letters in the system domain name contained in the username are lowercase letters. For example, **Local Domain** is set to **9427068F-6EFA-4833-B43E-60CB641E5B6C.COM**, and the username is **flume_server/hadoop.9427068f-6efa-4833-b43e-60cb641e5b6c.com**. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | compression-type | none | Specifies the message compression format, which can be set to **none** or **deflate**. **none** indicates that data is not compressed, while **deflate** indicates that data is compressed. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ssl | false | Specifies whether to use SSL encryption. If this parameter is set to **true**, the values of **keystore** and **keystore-password** must be specified. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore-type | JKS | Specifies the keystore type set after SSL is enabled. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore | ``-`` | Specifies the keystore file path set after SSL is enabled. This parameter is mandatory if SSL is enabled. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore-password | ``-`` | Specifies the keystore password set after SSL is enabled. This parameter is mandatory if SSL is enabled. | + +-----------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Common Channel Configurations +----------------------------- + +- **Memory Channel** + + A memory channel uses memory as the cache. Events are stored in memory queues. Common configurations are as follows: + + .. table:: **Table 6** Common configurations of a memory channel + + +------------------------------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +==============================+===============================+=========================================================================================================================================================+ + | type | ``-`` | Specifies the type of the memory channel, which must be set to **memory**. | + +------------------------------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | capacity | 10000 | Specifies the maximum number of events cached in a channel. | + +------------------------------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | transactionCapacity | 1000 | Specifies the maximum number of events accessed each time. | + | | | | + | | | .. note:: | + | | | | + | | | - The parameter value must be greater than the batchSize of the source and sink. | + | | | - The value of **transactionCapacity** must be less than or equal to that of **capacity**. | + +------------------------------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | channelfullcount | 10 | Specifies the channel full count. When the count reaches the threshold, an alarm is reported. | + +------------------------------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keep-alive | 3 | Specifies the waiting time of the Put and Take threads when the transaction or channel cache is full. Unit: second | + +------------------------------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | byteCapacity | 80% of the maximum JVM memory | Specifies the total bytes of all event bodies in a channel. The default value is the 80% of the maximum JVM memory (indicated by **-Xmx**). Unit: bytes | + +------------------------------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | byteCapacityBufferPercentage | 20 | Specifies the percentage of bytes in a channel (%). | + +------------------------------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **File Channel** + + A file channel uses local disks as the cache. Events are stored in the folder specified by **dataDirs**. Common configurations are as follows: + + .. table:: **Table 7** Common configurations of a file channel + + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=======================+======================================================+=================================================================================================================================================+ + | type | ``-`` | Specifies the type of the file channel, which must be set to **file**. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | checkpointDir | ${BIGDATA_DATA_HOME}/hadoop/data1~N/flume/checkpoint | Specifies the checkpoint storage directory. | + | | | | + | | .. note:: | | + | | | | + | | This path is changed with the custom data path. | | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | dataDirs | ${BIGDATA_DATA_HOME}/hadoop/data1~N/flume/data | Specifies the data cache directory. Multiple directories can be configured to improve performance. The directories are separated by commas (,). | + | | | | + | | .. note:: | | + | | | | + | | This path is changed with the custom data path. | | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | maxFileSize | 2146435071 | Specifies the maximum size of a single cache file, expressed in bytes. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | minimumRequiredSpace | 524288000 | Specifies the minimum idle space in the cache, expressed in bytes. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | capacity | 1000000 | Specifies the maximum number of events cached in a channel. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | transactionCapacity | 10000 | Specifies the maximum number of events accessed each time. | + | | | | + | | | .. note:: | + | | | | + | | | - The parameter value must be greater than the batchSize of the source and sink. | + | | | - The value of **transactionCapacity** must be less than or equal to that of **capacity**. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | channelfullcount | 10 | Specifies the channel full count. When the count reaches the threshold, an alarm is reported. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | useDualCheckpoints | false | Specifies the backup checkpoint. If this parameter is set to **true**, the **backupCheckpointDir** parameter value must be set. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | backupCheckpointDir | ``-`` | Specifies the path of the backup checkpoint. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | checkpointInterval | 30000 | Specifies the check interval, expressed in seconds. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | keep-alive | 3 | Specifies the waiting time of the Put and Take threads when the transaction or channel cache is full. Unit: second | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | use-log-replay-v1 | false | Specifies whether to enable the old reply logic. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | use-fast-replay | false | Specifies whether to enable the queue reply. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | checkpointOnClose | true | Specifies that whether a checkpoint is created when a channel is disabled. | + +-----------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Memory File Channel** + + A memory file channel uses both memory and local disks as its cache and supports message persistence. It provides similar performance as a memory channel and better performance than a file channel. This channel is currently experimental and not recommended for use in production. The following table describes common configuration items: Common configurations are as follows: + + .. table:: **Table 8** Common configurations of a memory file channel + + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=======================+============================================+=============================================================================================================================================================================================================================================================================================================================================================================================================+ + | type | org.apache.flume.channel.MemoryFileChannel | Specifies the type of the memory file channel, which must be set to **org.apache.flume.channel.MemoryFileChannel**. | + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | capacity | 50000 | Specifies the maximum number of events cached in a channel. | + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | transactionCapacity | 5000 | Specifies the maximum number of events processed by a transaction. | + | | | | + | | | .. note:: | + | | | | + | | | - The parameter value must be greater than the batchSize of the source and sink. | + | | | - The value of **transactionCapacity** must be less than or equal to that of **capacity**. | + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | subqueueByteCapacity | 20971520 | Specifies the maximum size of events that can be stored in a subqueue, expressed in bytes. | + | | | | + | | | A memory file channel uses both queues and subqueues to cache data. Events are stored in a subqueue, and subqueues are stored in a queue. | + | | | | + | | | **subqueueCapacity** and **subqueueInterval** determine the size of events that can be stored in a subqueue. **subqueueCapacity** specifies the capacity of a subqueue, and **subqueueInterval** specifies the duration that a subqueue can store events. Events in a subqueue are sent to the destination only after the subqueue reaches the upper limit of **subqueueCapacity** or **subqueueInterval**. | + | | | | + | | | .. note:: | + | | | | + | | | The value of **subqueueByteCapacity** must be greater than the number of events specified by **batchSize**. | + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | subqueueInterval | 2000 | Specifies the maximum duration that a subqueue can store events, expressed in milliseconds. | + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keep-alive | 3 | Specifies the waiting time of the Put and Take threads when the transaction or channel cache is full. | + | | | | + | | | Unit: second | + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | dataDir | ``-`` | Specifies the cache directory for local files. | + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | byteCapacity | 80% of the maximum JVM memory | Specifies the channel cache capacity. | + | | | | + | | | Unit: bytes | + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | compression-type | None | Specifies the message compression format, which can be set to **none** or **deflate**. **none** indicates that data is not compressed, while **deflate** indicates that data is compressed. | + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | channelfullcount | 10 | Specifies the channel full count. When the count reaches the threshold, an alarm is reported. | + +-----------------------+--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + The following is a configuration example of a memory file channel: + + .. code-block:: + + server.channels.c1.type = org.apache.flume.channel.MemoryFileChannel + server.channels.c1.dataDir = /opt/flume/mfdata + server.channels.c1.subqueueByteCapacity = 20971520 + server.channels.c1.subqueueInterval=2000 + server.channels.c1.capacity = 500000 + server.channels.c1.transactionCapacity = 40000 + +- **Kafka Channel** + + A Kafka channel uses a Kafka cluster as the cache. Kafka provides high availability and multiple copies to prevent data from being immediately consumed by sinks when Flume or Kafka Broker crashes. + + .. table:: **Table 9** Common configurations of a Kafka channel + + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +==================================+=======================+==========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | type | ``-`` | Specifies the type of the Kafka channel, which must be set to **org.apache.flume.channel.kafka.KafkaChannel**. | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.bootstrap.servers | ``-`` | Specifies the bootstrap address port list of Kafka. | + | | | | + | | | If Kafka has been installed in the cluster and the configuration has been synchronized to the server, you do not need to set this parameter on the server. The default value is the list of all brokers in the Kafka cluster. This parameter must be configured on the client. Use commas (,) to separate multiple values of *IP address:Port number*. The rules for matching ports and security protocols must be as follows: port 21007 matches the security mode (SASL_PLAINTEXT), and port 9092 matches the common mode (PLAINTEXT). | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.topic | flume-channel | Specifies the Kafka topic used by the channel to cache data. | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.consumer.group.id | flume | Specifies the data group ID obtained from Kafka. This parameter cannot be left blank. | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | parseAsFlumeEvent | true | Specifies whether data is parsed into Flume events. | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | migrateZookeeperOffsets | true | Specifies whether to search for offsets in ZooKeeper and submit them to Kafka when there is no offset in Kafka. | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.consumer.auto.offset.reset | latest | Specifies where to consume if there is no offset record, which can be set to **earliest**, **latest**, or **none**. **earliest** indicates that the offset is reset to the initial point, **latest** indicates that the offset is set to the latest position, and **none** indicates that an exception is thrown if there is no offset. | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.producer.security.protocol | SASL_PLAINTEXT | Specifies the Kafka producer security protocol. The rules for matching ports and security protocols must be as follows: port 21007 matches the security mode (SASL_PLAINTEXT), and port 9092 matches the common mode (PLAINTEXT). | + | | | | + | | | .. note:: | + | | | | + | | | If the parameter is not displayed, click **+** in the lower left corner of the dialog box to display all parameters. | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.consumer.security.protocol | SASL_PLAINTEXT | Specifies the Kafka consumer security protocol. The rules for matching ports and security protocols must be as follows: port 21007 matches the security mode (SASL_PLAINTEXT), and port 9092 matches the common mode (PLAINTEXT). | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | pollTimeout | 500 | Specifies the maximum timeout interval for the consumer to invoke the poll function. Unit: milliseconds | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ignoreLongMessage | false | Specifies whether to discard oversized messages. | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | messageMaxLength | 1000012 | Specifies the maximum length of a message written by Flume to Kafka. | + +----------------------------------+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Common Sink Configurations +-------------------------- + +- **HDFS Sink** + + An HDFS sink writes data into HDFS. Common configurations are as follows: + + .. table:: **Table 10** Common configurations of an HDFS sink + + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +==========================+=======================+=====================================================================================================================================================================================================================================================================================================================================================================+ + | channel | ``-`` | Specifies the channel connected to the sink. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | hdfs | Specifies the type of the hdfs sink, which must be set to **hdfs**. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.path | ``-`` | Specifies the data storage path in HDFS. The value must start with **hdfs://hacluster/**. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the sink is restarted. Unit: second | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.inUseSuffix | .tmp | Specifies the suffix of the HDFS file to which data is being written. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.rollInterval | 30 | Specifies the interval for file rolling, expressed in seconds. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.rollSize | 1024 | Specifies the size for file rolling, expressed in bytes. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.rollCount | 10 | Specifies the number of events for file rolling. | + | | | | + | | | .. note:: | + | | | | + | | | Parameters **rollInterval**, **rollSize**, and **rollCount** can be configured at the same time. The parameter meeting the requirements takes precedence for compression. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.idleTimeout | 0 | Specifies the timeout interval for closing idle files automatically, expressed in seconds. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.batchSize | 1000 | Specifies the number of events written into HDFS in batches. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosPrincipal | ``-`` | Specifies the Kerberos principal of HDFS authentication. This parameter is mandatory in a secure mode, but not required in a common mode. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosKeytab | ``-`` | Specifies the Kerberos keytab of HDFS authentication. This parameter is not required in a common mode, but in a secure mode, the Flume running user must have the permission to access **keyTab** path in the **jaas.cof** file. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.fileCloseByEndEvent | true | Specifies whether to close the HDFS file when the last event of the source file is received. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.batchCallTimeout | ``-`` | Specifies the timeout control duration when events are written into HDFS in batches. Unit: milliseconds | + | | | | + | | | If this parameter is not specified, the timeout duration is controlled when each event is written into HDFS. When the value of **hdfs.batchSize** is greater than 0, configure this parameter to improve the performance of writing data into HDFS. | + | | | | + | | | .. note:: | + | | | | + | | | The value of **hdfs.batchCallTimeout** depends on **hdfs.batchSize**. A greater **hdfs.batchSize** requires a larger **hdfs.batchCallTimeout**. If the value of **hdfs.batchCallTimeout** is too small, writing events to HDFS may fail. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | serializer.appendNewline | true | Specifies whether to add a line feed character (**\\n**) after an event is written to HDFS. If a line feed character is added, the data volume counters used by the line feed character will not be calculated by HDFS sinks. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.filePrefix | over_%{basename} | Specifies the file name prefix after data is written to HDFS. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.fileSuffix | ``-`` | Specifies the file name suffix after data is written to HDFS. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.inUsePrefix | ``-`` | Specifies the prefix of the HDFS file to which data is being written. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.fileType | DataStream | Specifies the HDFS file format, which can be set to **SequenceFile**, **DataStream**, or **CompressedStream**. | + | | | | + | | | .. note:: | + | | | | + | | | If the parameter is set to **SequenceFile** or **DataStream**, output files are not compressed, and the **codeC** parameter cannot be configured. However, if the parameter is set to **CompressedStream**, the output files are compressed, and the **codeC** parameter must be configured together. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.codeC | ``-`` | Specifies the file compression format, which can be set to **gzip**, **bzip2**, **lzo**, **lzop**, or **snappy**. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.maxOpenFiles | 5000 | Specifies the maximum number of HDFS files that can be opened. If the number of opened files reaches this value, the earliest opened files are closed. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.writeFormat | Writable | Specifies the file write format, which can be set to **Writable** or **Text**. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.callTimeout | 10000 | Specifies the timeout control duration each time events are written into HDFS, expressed in milliseconds. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.threadsPoolSize | ``-`` | Specifies the number of threads used by each HDFS sink for HDFS I/O operations. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.rollTimerPoolSize | ``-`` | Specifies the number of threads used by each HDFS sink to schedule the scheduled file rolling. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.round | false | Specifies whether to round off the timestamp value. If this parameter is set to true, all time-based escape sequences (except %t) are affected. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.roundUnit | second | Specifies the unit of the timestamp value that has been rounded off, which can be set to **second**, **minute**, or **hour**. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.useLocalTimeStamp | true | Specifies whether to enable the local timestamp. The recommended parameter value is **true**. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.closeTries | 0 | Specifies the maximum attempts for the **hdfs sink** to stop renaming a file. If the parameter is set to the default value **0**, the sink does not stop renaming the file until the file is successfully renamed. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.retryInterval | 180 | Specifies the interval of request for closing the HDFS file, expressed in seconds. | + | | | | + | | | .. note:: | + | | | | + | | | For each closing request, there are multiple RPCs working on the NameNode back and forth, which may make the NameNode overloaded if the parameter value is too small. Also, when the parameter is set to **0**, the Sink will not attempt to close the file, but opens the file or uses **.tmp** as the file name extension, if the first closing attempt fails. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.failcount | 10 | Specifies the number of times that data fails to be written to HDFS. If the number of times that the sink fails to write data to HDFS exceeds the parameter value, an alarm indicating abnormal data transmission is reported. | + +--------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Avro Sink** + + An Avro sink converts events into Avro events and sends them to the monitoring ports of the hosts. Common configurations are as follows: + + .. table:: **Table 11** Common configurations of an Avro sink + + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +===========================+=======================+=======================================================================================================================================================================================================================================================================================+ + | channel | ``-`` | Specifies the channel connected to the sink. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | ``-`` | Specifies the type of the avro sink, which must be set to **avro**. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hostname | ``-`` | Specifies the bound host name or IP address. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | port | ``-`` | Specifies the bound listening port. Ensure that this port is not occupied. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batch-size | 1000 | Specifies the number of events sent in a batch. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | client.type | DEFAULT | Specifies the client instance type. Set this parameter based on the communication protocol used by the configured model. The options are as follows: | + | | | | + | | | - **DEFAULT**: The client instance of the AvroRPC type is returned. | + | | | - **OTHER**: NULL is returned. | + | | | - **THRIFT**: The client instance of the Thrift RPC type is returned. | + | | | - **DEFAULT_LOADBALANCING**: The client instance of the LoadBalancing RPC type is returned. | + | | | - **DEFAULT_FAILOVER**: The client instance of the Failover RPC type is returned. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ssl | false | Specifies whether to use SSL encryption. If this parameter is set to **true**, the values of **keystore** and **keystore-password** must be specified. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore-type | JKS | Specifies the Java trust store type, which can be set to **JKS** or **PKCS12**. | + | | | | + | | | .. note:: | + | | | | + | | | Different passwords are used to protect the key store and private key of **JKS**, while the same password is used to protect the key store and private key of **PKCS12**. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore | ``-`` | Specifies the Java trust store file. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore-password | ``-`` | Specifies the Java trust store password. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore-type | JKS | Specifies the keystore type set after SSL is enabled. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore | ``-`` | Specifies the keystore file path set after SSL is enabled. This parameter is mandatory if SSL is enabled. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | keystore-password | ``-`` | Specifies the keystore password after SSL is enabled. This parameter is mandatory if SSL is enabled. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | connect-timeout | 20000 | Specifies the timeout for the first connection, expressed in milliseconds. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | request-timeout | 20000 | Specifies the maximum timeout for a request after the first request, expressed in milliseconds. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | reset-connection-interval | 0 | Specifies the interval between a connection failure and a second connection, expressed in seconds. If the parameter is set to **0**, the system continuously attempts to perform a connection. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | compression-type | none | Specifies the compression type of the batch data, which can be set to **none** or **deflate**. **none** indicates that data is not compressed, while **deflate** indicates that data is compressed. This parameter value must be the same as that of the AvroSource compression-type. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | compression-level | 6 | Specifies the compression level of batch data, which can be set to **1** to **9**. A larger value indicates a higher compression rate. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | exclude-protocols | SSLv3 | Specifies the excluded protocols. The entered protocols must be separated by spaces. The default value is **SSLv3**. | + +---------------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **HBase Sink** + + An HBase sink writes data into HBase. Common configurations are as follows: + + .. table:: **Table 12** Common configurations of an HBase sink + + +--------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +====================+===============+===================================================================================================================================================================================================================================+ + | channel | ``-`` | Specifies the channel connected to the sink. | + +--------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | ``-`` | Specifies the type of the HBase sink, which must be set to **hbase**. | + +--------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | table | ``-`` | Specifies the HBase table name. | + +--------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | columnFamily | ``-`` | Specifies the HBase column family. | + +--------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the sink is restarted. Unit: second | + +--------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | 1000 | Specifies the number of events written into HBase in batches. | + +--------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kerberosPrincipal | ``-`` | Specifies the Kerberos principal of HBase authentication. This parameter is mandatory in a secure mode, but not required in a common mode. | + +--------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kerberosKeytab | ``-`` | Specifies the Kerberos keytab of HBase authentication. This parameter is not required in a common mode, but in a secure mode, the Flume running user must have the permission to access **keyTab** path in the **jaas.cof** file. | + +--------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | coalesceIncrements | true | Specifies whether to perform multiple operations on the same hbase cell in a same processing batch. Setting this parameter to **true** improves performance. | + +--------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Kafka Sink** + + A Kafka sink writes data into Kafka. Common configurations are as follows: + + .. table:: **Table 13** Common configurations of a Kafka sink + + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=================================+================+=============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | channel | ``-`` | Specifies the channel connected to the sink. | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | ``-`` | Specifies the type of the kafka sink, which must be set to **org.apache.flume.sink.kafka.KafkaSink**. | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.bootstrap.servers | ``-`` | Specifies the bootstrap address port list of Kafka. If Kafka has been installed in the cluster and the configuration has been synchronized to the server, you do not need to set this parameter on the server. The default value is the list of all brokers in the Kafka cluster. The client must be configured with this parameter. If there are multiple values, use commas (,) to separate the values. The rules for matching ports and security protocols must be as follows: port 21007 matches the security mode (SASL_PLAINTEXT), and port 9092 matches the common mode (PLAINTEXT). | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | monTime | 0 (Disabled) | Specifies the thread monitoring threshold. When the update time exceeds the threshold, the sink is restarted. Unit: second | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.producer.acks | 1 | Successful write is determined by the number of received acknowledgement messages about replicas. The value **0** indicates that no confirm message needs to be received, the value **1** indicates that the system is only waiting for only the acknowledgement information from a leader, and the value **-1** indicates that the system is waiting for the acknowledgement messages of all replicas. If this parameter is set to **-1**, data loss can be avoided in some leader failure scenarios. | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.topic | ``-`` | Specifies the topic to which data is written. This parameter is mandatory. | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | flumeBatchSize | 1000 | Specifies the number of events written into Kafka in batches. | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.security.protocol | SASL_PLAINTEXT | Specifies the Kafka security protocol. The parameter value must be set to PLAINTEXT in a common cluster. The rules for matching ports and security protocols must be as follows: port 21007 matches the security mode (SASL_PLAINTEXT), and port 9092 matches the common mode (PLAINTEXT). | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ignoreLongMessage | false | Specifies whether to discard oversized messages. | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | messageMaxLength | 1000012 | Specifies the maximum length of a message written by Flume to Kafka. | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | defaultPartitionId | ``-`` | Specifies the Kafka partition ID to which the events of a channel is transferred. The **partitionIdHeader** value overwrites this parameter value. By default, if this parameter is left blank, events will be distributed by the Kafka Producer's partitioner (by a specified key or a partitioner customized by **kafka.partitioner.class**). | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | partitionIdHeader | ``-`` | When you set this parameter, the sink will take the value of the field named using the value of this property from the event header and send the message to the specified partition of the topic. If the value does not have a valid partition, **EventDeliveryException** is thrown. If the header value already exists, this setting overwrites the **defaultPartitionId** parameter. | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Other Kafka Producer Properties | ``-`` | Specifies other Kafka configurations. This parameter can be set to any production configuration supported by Kafka, and the **.kafka** prefix must be added to the configuration. | + +---------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Thrift Sink** + + A Thrift sink converts events to Thrift events and sends them to the monitoring port of the configured host. Common configurations are as follows: + + .. table:: **Table 14** Common configurations of a Thrift sink + + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +===========================+===============+=========================================================================================================================================================================================================+ + | channel | ``-`` | Specifies the channel connected to the sink. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | type | thrift | Specifies the type of the thrift sink, which must be set to **thrift**. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hostname | ``-`` | Specifies the bound host name or IP address. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | port | ``-`` | Specifies the bound listening port. Ensure that this port is not occupied. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batch-size | 1000 | Specifies the number of events sent in a batch. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | connect-timeout | 20000 | Specifies the timeout for the first connection, expressed in milliseconds. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | request-timeout | 20000 | Specifies the maximum timeout for a request after the first request, expressed in milliseconds. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kerberos | false | Specifies whether Kerberos authentication is enabled. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | client-keytab | ``-`` | Specifies the path of the client **keytab** file. The Flume running user must have the access permission on the authentication file. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | client-principal | ``-`` | Specifies the principal of the security user used by the client. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | server-principal | ``-`` | Specifies the principal of the security user used by the server. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | compression-type | none | Specifies the compression type of data sent by Flume, which can be set to **none** or **deflate**. **none** indicates that data is not compressed, while **deflate** indicates that data is compressed. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | maxConnections | 5 | Specifies the maximum size of the connection pool for Flume to send data. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ssl | false | Specifies whether to use SSL encryption. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore-type | JKS | Specifies the Java trust store type. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore | ``-`` | Specifies the Java trust store file. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore-password | ``-`` | Specifies the Java trust store password. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | reset-connection-interval | 0 | Specifies the interval between a connection failure and a second connection, expressed in seconds. If the parameter is set to **0**, the system continuously attempts to perform a connection. | + +---------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Precautions +----------- + +- What are the reliability measures of Flume? + + - Use the transaction mechanisms between Source and Channel as well as between Channel and Sink. + + - Configure the failover and load_balance mechanisms for Sink Processor. The following shows a load balancing example. + + .. code-block:: + + server.sinkgroups=g1 + server.sinkgroups.g1.sinks=k1 k2 + server.sinkgroups.g1.processor.type=load_balance + server.sinkgroups.g1.processor.backoff=true + server.sinkgroups.g1.processor.selector=random + +- What are the precautions for the aggregation and cascading of multiple Flume agents? + + - Avro or Thrift protocol can be used for cascading. + - When the aggregation end contains multiple nodes, evenly distribute the agents and do not aggregate all agents on a single node. diff --git a/doc/component-operation-guide-lts/source/using_flume/index.rst b/doc/component-operation-guide-lts/source/using_flume/index.rst new file mode 100644 index 0000000..a396eb9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/index.rst @@ -0,0 +1,50 @@ +:original_name: mrs_01_0390.html + +.. _mrs_01_0390: + +Using Flume +=========== + +- :ref:`Using Flume from Scratch ` +- :ref:`Overview ` +- :ref:`Installing the Flume Client on Clusters ` +- :ref:`Viewing Flume Client Logs ` +- :ref:`Stopping or Uninstalling the Flume Client ` +- :ref:`Using the Encryption Tool of the Flume Client ` +- :ref:`Flume Service Configuration Guide ` +- :ref:`Flume Configuration Parameter Description ` +- :ref:`Using Environment Variables in the properties.properties File ` +- :ref:`Non-Encrypted Transmission ` +- :ref:`Encrypted Transmission ` +- :ref:`Viewing Flume Client Monitoring Information ` +- :ref:`Connecting Flume to Kafka in Security Mode ` +- :ref:`Connecting Flume with Hive in Security Mode ` +- :ref:`Configuring the Flume Service Model ` +- :ref:`Introduction to Flume Logs ` +- :ref:`Flume Client Cgroup Usage Guide ` +- :ref:`Secondary Development Guide for Flume Third-Party Plug-ins ` +- :ref:`Common Issues About Flume ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + using_flume_from_scratch + overview + installing_the_flume_client_on_clusters + viewing_flume_client_logs + stopping_or_uninstalling_the_flume_client + using_the_encryption_tool_of_the_flume_client + flume_service_configuration_guide + flume_configuration_parameter_description + using_environment_variables_in_the_properties.properties_file + non-encrypted_transmission/index + encrypted_transmission/index + viewing_flume_client_monitoring_information + connecting_flume_to_kafka_in_security_mode + connecting_flume_with_hive_in_security_mode + configuring_the_flume_service_model/index + introduction_to_flume_logs + flume_client_cgroup_usage_guide + secondary_development_guide_for_flume_third-party_plug-ins + common_issues_about_flume diff --git a/doc/component-operation-guide-lts/source/using_flume/installing_the_flume_client_on_clusters.rst b/doc/component-operation-guide-lts/source/using_flume/installing_the_flume_client_on_clusters.rst new file mode 100644 index 0000000..db82e27 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/installing_the_flume_client_on_clusters.rst @@ -0,0 +1,132 @@ +:original_name: mrs_01_1595.html + +.. _mrs_01_1595: + +Installing the Flume Client on Clusters +======================================= + +Scenario +-------- + +To use Flume to collect logs, you must install the Flume client on a log host. + +Prerequisites +------------- + +- A cluster with the Flume component has been created. +- The log host is in the same VPC and subnet with the MRS cluster. Log in to the node where the client is installed. For details, see\ :ref:`Using an MRS Client on Nodes Outside a MRS Cluster `. +- You have obtained the username and password for logging in to the log host. +- The installation directory is automatically created if it does not exist. If it exists, the directory must be left blank. The directory path cannot contain any space. + +Procedure +--------- + +#. Obtain the software package. + + Log in to the FusionInsight Manager. Choose **Cluster** > *Name of the target cluster* > **Services** > **Flume**. On the Flume service page that is displayed, choose **More** > **Download Client** in the upper right corner and set **Select Client Type** to **Complete Client** to download the Flume service client file. + + The file name of the client is **FusionInsight_Cluster\_**\ <*Cluster ID*>\ **\_Flume_Client.tar**. This section takes the client file **FusionInsight_Cluster_1_Flume_Client.tar** as an example. + +#. Upload the software package. + + Upload the software package to a directory, for example, **/opt/client** on the node where the Flume service client will be installed as user **user**. + + .. note:: + + **user** is the user who installs and runs the Flume client. + +#. Decompress the software package. + + Log in to the node where the Flume service client is to be installed as user **user**. Go to the directory where the installation package is installed, for example, **/opt/client**, and run the following command to decompress the installation package to the current directory: + + **cd /opt/client** + + **tar -xvf FusionInsight\_Cluster_1_Flume_Client.tar** + +#. Verify the software package. + + Run the **sha256sum -c** command to verify the decompressed file. If **OK** is returned, the verification is successful. Example: + + **sha256sum -c FusionInsight\_Cluster_1_Flume_ClientConfig.tar.sha256** + + .. code-block:: + + FusionInsight_Cluster_1_Flume_ClientConfig.tar: OK + +#. Decompress the package. + + **tar -xvf FusionInsight\_Cluster_1_Flume_ClientConfig.tar** + +#. Check whether the number of clients is **1**. + + - If yes, use the independent installation mode and go to :ref:`7 `. The installation is complete. + - If no, use the batch installation mode and go to :ref:`8 `. + +#. .. _mrs_01_1595__en-us_topic_0000001173471390_li66632015131310: + + Run the following command in the Flume client installation directory to install the client to a specified directory (for example, **opt/FlumeClient**): After the client is installed successfully, the installation is complete. + + **cd /opt/client/FusionInsight\_Cluster_1_Flume_ClientConfig/Flume/FlumeClient** + + **./install.sh -d /**\ *opt/FlumeClient* **-f** *MonitorServerService IP address or host name of the role* **-c** *User service configuration filePath for storing properties.properties* **-s** *CPU threshold* **-l /var/log/Bigdata -e** *FlumeServer service IP address or host name* **-n** *Flume* + + .. note:: + + - **-d**: Flume client installation path + + - (Optional) **-f**: IP addresses or host names of two MonitorServer roles. The IP addresses or host names are separated by commas (,). If this parameter is not configured, the Flume client does not send alarm information to MonitorServer and information about the client cannot be viewed on the FusionInsight Manager GUI. + + - (Optional) **-c**: Service configuration file, which needs to be generated by the user based on the service. For details about how to generate the file on the configuration tool page of the Flume server, see :ref:`Flume Service Configuration Guide `. Upload the file to any directory on the node where the client is to be installed. If this parameter is not specified during the installation, you can upload the generated service configuration file **properties.properties** to the **/opt/FlumeClient/fusioninsight-flume-1.9.0/conf** directory after the installation. + + - (Optional) **-s**: cgroup threshold. The value is an integer ranging from 1 to 100 x *N*. *N* indicates the number of CPU cores. The default threshold is **-1**, indicating that the processes added to the cgroup are not restricted by the CPU usage. + + - (Optional) **-l**: Log path. The default value is **/var/log/Bigdata**. The user **user** must have the write permission on the directory. When the client is installed for the first time, a subdirectory named **flume-client** is generated. After the installation, subdirectories named **flume-client-**\ *n* will be generated in sequence. The letter *n* indicates a sequence number, which starts from 1 in ascending order. In the **/conf/** directory of the Flume client installation directory, modify the **ENV_VARS** file and search for the **FLUME_LOG_DIR** attribute to view the client log path. + + - (Optional) **-e**: Service IP address or host name of FlumeServer, which is used to receive statistics for the monitoring indicator reported by the client. + + - (Optional) **-n**: Name of the Flume client. You can choose **Cluster** > *Name of the desired cluster* > **Service** > **Flume** > **Flume Management** on FusionInsight Manager to view the client name on the corresponding node. + + - If the following error message is displayed, run the **export JAVA_HOME=\ JDK path** command. + + .. code-block:: + + JAVA_HOME is null in current user,please install the JDK and set the JAVA_HOME + + - IBM JDK does not support **-Xloggc**. You must change **-Xloggc** to **-Xverbosegclog** in **flume/conf/flume-env.sh**. For 32-bit JDK, the value of **-Xmx** must not exceed 3.25 GB. + + - When installing a cross-platform client in a cluster, go to the **/opt/client/FusionInsight_Cluster_1_Flume_ClientConfig/Flume/FusionInsight-Flume-1.9.0.tar.gz** directory to install the Flume client. + +#. .. _mrs_01_1595__en-us_topic_0000001173471390_li146677154135: + + Go to the directory for installing clients in batches. + + **cd /opt/client/FusionInsight\_Cluster_1_Flume_ClientConfig/Flume/FlumeClient/batch_install** + + .. note:: + + When installing a cross-platform client in a cluster, go to the **/opt/client/FusionInsight_Cluster_1_Flume_ClientConfig/Flume/FusionInsight-Flume-1.9.0.tar.gz** directory to install the Flume client. + +#. Configure the **host_info.cfg** file. The format of the configuration file is as follows: + + host_ip="",user="",password="",install_path="",flume_config_file="",monitor_server_ip="",log_path="",flume_server_ip="",cgroup_threshold="",client_name="" + + .. note:: + + - (Mandatory) *host_ip*: IP address of the node where the Flume client is to be installed. + - (Mandatory) *user*: User name for logging in to the node where the Flume client is to be installed remotely. + - (Mandatory) *password*: Password for logging in to the Flume client to be installed remotely. + - (Mandatory) **install_path**: Installation path of the Flume client. + - (Optional) **flume_config_file**: Configuration file for Flume running. You are advised to specify this configuration file during Flume installation. If you do not set this parameter, retain the value "" and do not delete the parameter. + - (Optional) **monitor_server_ip**: Service IP address of the Flume MonitorServer in the cluster. You can check the IP address on FusionInsight Manager. You can select either of the two IP addresses. If the IP address is not configured, the client does not send alarm information to the cluster in the scenario where a process is faulty. + - (Optional) **log_path**: Path for storing Flume run logs. If this parameter is not set, logs are recorded in **/var/log/Bigdata/flume-client**\ ``-``\ *Index* by default. Index value: If there is only one client in this path, the value is 1. If there are multiple clients, the index value is incremented by 1. + - (Optional) **flume_server_ip**: Service IP address of the Flume server. The indicator information of the client is reported to the cluster from this node. The indicator information about the client can be displayed on the web client. If the indicator information is not configured, the client does not display the indicator information. + - (Optional) **cgroup_threshold**: cgroup threshold. The value is an integer ranging from 1 to 100 x *N*. *N* indicates the number of CPU cores. The default threshold is **-1**, indicating that the processes added to the cgroup are not restricted by the CPU usage. + - (Optional) **client_name**: Client name. The client name is displayed on the client monitoring page. If the client name is not configured, the client name is empty. + +#. Run the following command to install the Flume client in batches. + + **./batch_install.sh -p /opt/client/FusionInsight_Cluster_1_Flume_Client.tar** + +#. Delete the password information from the **host_info.cfg** file. + + After the batch installation is complete, delete the password information from the **host_info.cfg** file immediately. Otherwise, the password may be disclosed. diff --git a/doc/component-operation-guide-lts/source/using_flume/introduction_to_flume_logs.rst b/doc/component-operation-guide-lts/source/using_flume/introduction_to_flume_logs.rst new file mode 100644 index 0000000..60e8549 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/introduction_to_flume_logs.rst @@ -0,0 +1,96 @@ +:original_name: mrs_01_1081.html + +.. _mrs_01_1081: + +Introduction to Flume Logs +========================== + +Log Description +--------------- + +**Log path**: The default path of Flume log files is **/var/log/Bigdata/**\ *Role name*. + +- FlumeServer: **/var/log/Bigdata/flume/flume** +- FlumeClient: **/var/log/Bigdata/flume-client-n/flume** +- MonitorServer: **/var/log/Bigdata/flume/monitor** + +**Log archive rule**: The automatic Flume log compression function is enabled. By default, when the size of logs exceeds 50 MB , logs are automatically compressed into a log file named in the following format: *-.[ID]*\ **.log.zip**. A maximum of 20 latest compressed files are reserved. The number of compressed files can be configured on the Manager portal. + +.. table:: **Table 1** Flume log list + + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | Type | Name | Description | + +==========+=====================================+======================================================================================+ + | Run logs | /flume/flumeServer.log | Log file that records FlumeServer running environment information. | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /flume/install.log | FlumeServer installation log file | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /flume/flumeServer-gc.log.\ ** | GC log file of the FlumeServer process | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /flume/prestartDvietail.log | Work log file before the FlumeServer startup | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /flume/startDetail.log | Startup log file of the Flume process | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /flume/stopDetail.log | Shutdown log file of the Flume process | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /monitor/monitorServer.log | Log file that records MonitorServer running environment information | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /monitor/startDetail.log | Startup log file of the MonitorServer process | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /monitor/stopDetail.log | Shutdown log file of the MonitorServer process | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | function.log | External function invoking log file | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | threadDump-.log | The jstack log file to be printed when the NodeAgent delivers a service stop command | + +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + +Log Level +--------- + +:ref:`Table 2 ` describes the log levels supported by Flume. + +Levels of run logs are FATAL, ERROR, WARN, INFO, and DEBUG from the highest to the lowest priority. Run logs of equal or higher levels are recorded. The higher the specified log level, the fewer the logs recorded. + +.. _mrs_01_1081__en-us_topic_0000001173789878_tc09b739e3eb34797a6da936a37654e97: + +.. table:: **Table 2** Log level + + +---------+-------+------------------------------------------------------------------------------------------+ + | Type | Level | Description | + +=========+=======+==========================================================================================+ + | Run log | FATAL | Logs of this level record critical error information about system running. | + +---------+-------+------------------------------------------------------------------------------------------+ + | | ERROR | Logs of this level record error information about system running. | + +---------+-------+------------------------------------------------------------------------------------------+ + | | WARN | Logs of this level record exception information about the current event processing. | + +---------+-------+------------------------------------------------------------------------------------------+ + | | INFO | Logs of this level record normal running status information about the system and events. | + +---------+-------+------------------------------------------------------------------------------------------+ + | | DEBUG | Logs of this level record the system information and system debugging information. | + +---------+-------+------------------------------------------------------------------------------------------+ + +To modify log levels, perform the following operations: + +#. Go to the **All Configurations** page of Flume by referring to :ref:`Modifying Cluster Service Configuration Parameters `. +#. On the menu bar on the left, select the log menu of the target role. +#. Select a desired log level. +#. Save the configuration. In the displayed dialog box, click **OK** to make the configurations take effect. + +.. note:: + + The configurations take effect immediately without the need to restart the service. + +Log Format +---------- + +The following table lists the Flume log formats. + +.. table:: **Table 3** Log format + + +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Format | Example | + +==========+========================================================================================================================================================+==================================================================================================================================================+ + | Run logs | <*yyyy-MM-dd HH:mm:ss,SSS*>|<*Log level*>|<*Name of the thread that generates the log*>|<*Message in the log*>|<*Location where the log event occurs*> | 2014-12-12 11:54:57,316 \| INFO \| [main] \| log4j dynamic load is start. \| org.apache.flume.tools.LogDynamicLoad.start(LogDynamicLoad.java:59) | + +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ + | | <*yyyy-MM-dd HH:mm:ss,SSS*><*Username*><*User IP*><*Time*><*Operation*><*Resource*><*Result*><*Detail>* | 2014-12-12 23:04:16,572 \| INFO \| [SinkRunner-PollingRunner-DefaultSinkProcessor] \| SRCIP=null OPERATION=close | + +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/configuring_non-encrypted_transmission.rst b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/configuring_non-encrypted_transmission.rst new file mode 100644 index 0000000..4828b43 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/configuring_non-encrypted_transmission.rst @@ -0,0 +1,120 @@ +:original_name: mrs_01_1060.html + +.. _mrs_01_1060: + +Configuring Non-encrypted Transmission +====================================== + +Scenario +-------- + +This section describes how to configure Flume server and client parameters after the cluster and the Flume service are installed to ensure proper running of the service. + +.. note:: + + By default, the cluster network environment is secure and the SSL authentication is not enabled during the data transmission process. For details about how to use the encryption mode, see :ref:`Configuring the Encrypted Transmission `. + +Prerequisites +------------- + +- The cluster and Flume service have been installed. +- The network environment of the cluster is secure. + +Procedure +--------- + +#. Configure the client parameters of the Flume role. + + a. Use the Flume configuration tool on FusionInsight Manager to configure the Flume role client parameters and generate a configuration file. + + #. Log in to FusionInsight Manager. Choose **Cluster** > **Services** > **Flume** > **Configuration Tool**. + + #. Set **Agent Name** to **client**. Select and drag the source, channel, and sink to be used to the GUI on the right, and connect them. + + For example, use SpoolDir Source, File Channel, and Avro Sink, as shown in :ref:`Figure 1 `. + + .. _mrs_01_1060__en-us_topic_0000001173949126_f48a8b06747cd425f9ce3e09139d567fe: + + .. figure:: /_static/images/en-us_image_0000001296059860.png + :alt: **Figure 1** Example for the Flume configuration tool + + **Figure 1** Example for the Flume configuration tool + + #. Double-click the source, channel, and sink. Set corresponding configuration parameters by referring to :ref:`Table 1 ` based on the actual environment. + + .. note:: + + - If the client parameters of the Flume role have been configured, you can obtain the existing client parameter configuration file from *client installation directory*\ **/fusioninsight-flume-1.9.0/conf/properties.properties** to ensure that the configuration is in concordance with the previous. Log in to FusionInsight Manager, choose **Cluster** > **Services** > **Flume** > **Configuration** > **Import**, import the file, and modify the configuration items related to non-encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + + #. Click **Export** to save the **properties.properties** configuration file to the local server. + + .. _mrs_01_1060__en-us_topic_0000001173949126_td6a1d78a7c28412b91ec3b4ecdfdfef1: + + .. table:: **Table 1** Parameters to be modified of the Flume role client + + +-----------------------+-------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=======================+===================================================================================================================+=======================+ + | ssl | Specifies whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | false | + | | | | + | | Only Sources of the Avro type have this configuration item. | | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the function is not enabled. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------+-----------------------+ + + b. Upload the **properties.properties** file to **flume/conf/** under the installation directory of the Flume client. + +#. Configure the server parameters of the Flume role and upload the configuration file to the cluster. + + a. Use the Flume configuration tool on the FusionInsight Manager portal to configure the server parameters and generate the configuration file. + + #. Log in to FusionInsight Manager. Choose **Cluster** > **Services** > **Flume** > **Configuration Tool**. + + #. Set **Agent Name** to **server**. Select and drag the source, channel, and sink to be used to the GUI on the right, and connect them. + + For example, use Avro Source, File Channel, and HDFS Sink, as shown in :ref:`Figure 2 `. + + .. _mrs_01_1060__en-us_topic_0000001173949126_fc16f89e37d8b45e4ab9853d9d2a4824d: + + .. figure:: /_static/images/en-us_image_0000001349259161.png + :alt: **Figure 2** Example for the Flume configuration tool + + **Figure 2** Example for the Flume configuration tool + + #. Double-click the source, channel, and sink. Set corresponding configuration parameters by referring to :ref:`Table 2 ` based on the actual environment. + + .. note:: + + - If the server parameters of the Flume role have been configured, you can choose **Cluster** > **Services** > **Flume** > **Instance** on FusionInsight Manager. Then select the corresponding Flume role instance and click the **Download** button behind the **flume.config.file** parameter on the **Instance Configurations** page to obtain the existing server parameter configuration file. Choose **Cluster** > **Service** > **Flume** > **Configurations** > **Import**, import the file, and modify the configuration items related to non-encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + - A unique checkpoint directory needs to be configured for each File Channel. + + #. Click **Export** to save the **properties.properties** configuration file to the local server. + + .. _mrs_01_1060__en-us_topic_0000001173949126_tc4c29f50094a4d978a0048c9f8652714: + + .. table:: **Table 2** Parameters to be modified of the Flume role server + + +-----------------------+-------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=======================+===================================================================================================================+=======================+ + | ssl | Specifies whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | false | + | | | | + | | Only Sources of the Avro type have this configuration item. | | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the function is not enabled. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------+-----------------------+ + + b. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Flume**. On the **Instances** tab page, click **Flume**. + c. Select the Flume role of the node where the configuration file is to be uploaded, choose **Instance Configurations** > **Import** beside the **flume.config.file**, and select the **properties.properties** file. + + .. note:: + + - An independent server configuration file can be uploaded to each Flume instance. + - This step is required for updating the configuration file. Modifying the configuration file on the background is an improper operation because the modification will be overwritten after configuration synchronization. + + d. Click **Save**, and then click **OK**. + e. Click **Finish**. diff --git a/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/index.rst b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/index.rst new file mode 100644 index 0000000..ef72c8c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/index.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_1059.html + +.. _mrs_01_1059: + +Non-Encrypted Transmission +========================== + +- :ref:`Configuring Non-encrypted Transmission ` +- :ref:`Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka ` +- :ref:`Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS ` +- :ref:`Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS ` +- :ref:`Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS ` +- :ref:`Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client ` +- :ref:`Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + configuring_non-encrypted_transmission + typical_scenario_collecting_local_static_logs_and_uploading_them_to_kafka + typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs + typical_scenario_collecting_local_dynamic_logs_and_uploading_them_to_hdfs + typical_scenario_collecting_logs_from_kafka_and_uploading_them_to_hdfs + typical_scenario_collecting_logs_from_kafka_and_uploading_them_to_hdfs_through_the_flume_client + typical_scenario_collecting_local_static_logs_and_uploading_them_to_hbase diff --git a/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_dynamic_logs_and_uploading_them_to_hdfs.rst b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_dynamic_logs_and_uploading_them_to_hdfs.rst new file mode 100644 index 0000000..e5b23cd --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_dynamic_logs_and_uploading_them_to_hdfs.rst @@ -0,0 +1,103 @@ +:original_name: mrs_01_1064.html + +.. _mrs_01_1064: + +Typical Scenario: Collecting Local Dynamic Logs and Uploading Them to HDFS +========================================================================== + +Scenario +-------- + +This section describes how to use Flume client to collect dynamic logs from a local PC and save them to the **/flume/test** directory on HDFS. + +.. note:: + + By default, the cluster network environment is secure and the SSL authentication is not enabled during the data transmission process. For details about how to use the encryption mode, see :ref:`Configuring the Encrypted Transmission `. The configuration applies to scenarios where only the Flume is configured, for example, Taildir Source+Memory Channel+HDFS Sink. + +Prerequisites +------------- + +- The cluster, HDFS, and Flume service have been installed. +- The Flume client has been installed. For details about how to install the client, see :ref:`Installing the Flume Client on Clusters `. +- The network environment of the cluster is secure. +- You have created user **flume_hdfs** and authorized the HDFS directory and data to be operated during log verification. For details, see :ref:`Adding a Ranger Access Permission Policy for HDFS `. + +Procedure +--------- + +#. On FusionInsight Manager, choose **System > User** and choose **More > Download Authentication Credential** to download the Kerberos certificate file of user **flume_hdfs** and save it to the local host. + +#. Set Flume parameters. + + Use the Flume configuration tool on FusionInsight Manager to configure the Flume role client parameters and generate a configuration file. + + a. Log in to FusionInsight Manager and choose **Cluster** > **Services**. On the page that is displayed, choose **Flume**. On the displayed page, click the **Configuration Tool** tab. + + b. Set **Agent Name** to **client**. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them. + + Use Taildir Source, Memory Channel, and HDFS Sink. + + + .. figure:: /_static/images/en-us_image_0000001295740120.png + :alt: **Figure 1** Example for the Flume configuration tool + + **Figure 1** Example for the Flume configuration tool + + c. Double-click the source, channel, and sink. Refer to :ref:`Table 1 ` to set corresponding configuration parameters based on the actual environment. + + .. note:: + + - If you want to continue using the **properties.propretites** file by modifying it, log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services**. On the page that is displayed, choose **Flume**. On the displayed page, click the **Configuration Tool** tab, click **Import**, import the file, and modify the configuration items related to non-encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + + d. .. _mrs_01_1064__l78938a30f82d4a5283b7c4aaa1bb79b1: + + Click **Export** to save the **properties.properties** configuration file to the local. + + .. _mrs_01_1064__tb525d823c30a44c9a93cf396c6cfa099: + + .. table:: **Table 1** Parameters to be modified of the Flume role client + + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +========================+==============================================================================================================================================================================================================================================================================================================+============================================================================================================================================================================================================================================+ + | Name | The value must be unique and cannot be left blank. | test | + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | filegroups | Specifies the file group list name. This parameter cannot be left blank. Values are separated by spaces | epgtest | + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | positionFile | Specifies the location where the collected file information (file name and location from which the file collected) is saved. This parameter cannot be left blank. The file does not need to be created manually, but the Flume running user needs to have the write permission on its upper-level directory. | /home/omm/flume/positionfile | + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batch-size | Specifies the number of events that Flume sends in a batch. | 61200 | + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.path | Specifies the HDFS data write directory. This parameter cannot be left blank. | hdfs://hacluster/flume/test | + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.inUsePrefix | Specifies the prefix of the file that is being written to HDFS. | TMP\_ | + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.batchSize | Specifies the maximum number of events that can be written to HDFS once. | 61200 | + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosPrincipal | Specifies the Kerberos authentication user, which is mandatory in security versions. This configuration is required only in security clusters. | flume_hdfs | + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosKeytab | Specifies the keytab file path for Kerberos authentication, which is mandatory in security versions. This configuration is required only in security clusters. | /opt/test/conf/user.keytab | + | | | | + | | | .. note:: | + | | | | + | | | Obtain the **user.keytab** file from the Kerberos certificate file of the user **flume_hdfs**. In addition, ensure that the user who installs and runs the Flume client has the read and write permissions on the **user.keytab** file. | + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.useLocalTimeStamp | Specifies whether to use the local time. Possible values are **true** and **false**. | true | + +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Upload the configuration file. + + Upload the file exported in :ref:`2.d ` to the *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume component version number*\ **/conf** directory of the cluster + +4. Verify log transmission. + + a. Log in to FusionInsight Manager as a user who has the management permission on HDFS. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > **Services** > **HDFS**. On the page that is displayed, click the **NameNode(**\ *Node name*\ **,Active)** link next to **NameNode WebUI** to go to the HDFS web UI. On the displayed page, choose **Utilities** > **Browse the file system**. + + b. Check whether the data is generated in the **/flume/test** directory on the HDFS. + + + .. figure:: /_static/images/en-us_image_0000001349259225.png + :alt: **Figure 2** Checking HDFS directories and files + + **Figure 2** Checking HDFS directories and files diff --git a/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hbase.rst b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hbase.rst new file mode 100644 index 0000000..82c9b12 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hbase.rst @@ -0,0 +1,189 @@ +:original_name: mrs_01_1067.html + +.. _mrs_01_1067: + +Typical Scenario: Collecting Local Static Logs and Uploading Them to HBase +========================================================================== + +Scenario +-------- + +This section describes how to use Flume client to collect static logs from a local computer and upload them to the **flume_test** table of HBase. + +.. note:: + + By default, the cluster network environment is secure and the SSL authentication is not enabled during the data transmission process. For details about how to use the encryption mode, see :ref:`Configuring the Encrypted Transmission `. The configuration can apply to scenarios where only the client or the server is configured, for example, Client/Server:Spooldir Source+File Channel+HBase Sink. + +Prerequisites +------------- + +- The cluster, HBase, and Flume service have been installed. +- The Flume client has been installed. For details about how to install the client, see :ref:`Installing the Flume Client on Clusters `. +- The network environment of the cluster is secure. +- An HBase table has been created by running the **create 'flume_test', 'cf'** command. +- The system administrator has understood service requirements and prepared HBase administrator **flume_hbase**. + +Procedure +--------- + +#. On FusionInsight Manager, choose **System > User** and choose **More > Download Authentication Credential** to download the Kerberos certificate file of user **flume_hbase** and save it to the local host. +#. Configure the client parameters of the Flume role. + + a. Use the Flume configuration tool on FusionInsight Manager to configure the Flume role client parameters and generate a configuration file. + + #. Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool**. + + #. Set **Agent Name** to **client**. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them. + + Use SpoolDir Source, File Channel, and Avro Sink, as shown in :ref:`Figure 1 `. + + .. _mrs_01_1067__en-us_topic_0000001219149245_feb76f182aaac42cbbc167cb040e99cc3: + + .. figure:: /_static/images/en-us_image_0000001349059729.png + :alt: **Figure 1** Example for the Flume configuration tool + + **Figure 1** Example for the Flume configuration tool + + #. Double-click the source, channel, and sink. Set corresponding configuration parameters by seeing :ref:`Table 1 ` based on the actual environment. + + .. note:: + + - If the client parameters of the Flume role have been configured, you can obtain the existing client parameter configuration file from *client installation directory*\ **/fusioninsight-flume-1.9.0/conf/properties.properties** to ensure that the configuration is in concordance with the previous. Log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool** > **Import**, import the file, and modify the configuration items related to non-encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + + #. Click **Export** to save the **properties.properties** configuration file to the local. + + .. _mrs_01_1067__en-us_topic_0000001219149245_t945f3224f8ee46489f0af2556fb132e3: + + .. table:: **Table 1** Parameters to be modified of the Flume role client + + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+=========================================================================================================================================================================================================================================================================================================================================================================================================================================================+============================================+ + | Name | The value must be unique and cannot be left blank. | test | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + | spoolDir | Specifies the directory where the file to be collected resides. This parameter cannot be left blank. The directory needs to exist and have the write, read, and execute permissions on the flume running user. | /srv/BigData/hadoop/data1/zb | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + | trackerDir | Specifies the path for storing the metadata of files collected by Flume. | /srv/BigData/hadoop/data1/tracker | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + | batchSize | Specifies the number of events that Flume sends in a batch (number of data pieces). A larger value indicates higher performance and lower timeliness. | 61200 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + | dataDirs | Specifies the directory for storing buffer data. The run directory is used by default. Configuring multiple directories on disks can improve transmission efficiency. Use commas (,) to separate multiple directories. If the directory is inside the cluster, the **/srv/BigData/hadoop/dataX/flume/data** directory can be used. **dataX** ranges from data1 to dataN. If the directory is outside the cluster, it needs to be independently planned. | /srv/BigData/hadoop/data1/flume/data | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + | checkpointDir | Specifies the directory for storing the checkpoint information, which is under the run directory by default. If the directory is inside the cluster, the **/srv/BigData/hadoop/dataX/flume/checkpoint** directory can be used. **dataX** ranges from data1 to dataN. If the directory is outside the cluster, it needs to be independently planned. | /srv/BigData/hadoop/data1/flume/checkpoint | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + | transactionCapacity | Specifies the transaction size, that is, the number of events in a transaction that can be processed by the current Channel. The size cannot be smaller than the batchSize of Source. Setting the same size as batchSize is recommended. | 61200 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + | hostname | Specifies the name or IP address of the host whose data is to be sent. This parameter cannot be left blank. Name or IP address must be configured to be the name or IP address that the Avro source associated with it. | 192.168.108.11 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + | port | Specifies the port that sends the data. This parameter cannot be left blank. It must be consistent with the port that is monitored by the connected Avro Source. | 21154 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + | ssl | Specifies whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | false | + | | | | + | | Only Sources of the Avro type have this configuration item. | | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+ + + b. Upload the **properties.properties** file to **flume/conf/** under the installation directory of the Flume client. + +#. Configure the server parameters of the Flume role and upload the configuration file to the cluster. + + a. Use the Flume configuration tool on the FusionInsight Manager portal to configure the server parameters and generate the configuration file. + + #. Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool**. + + #. Set **Agent Name** to **server**. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them. + + For example, use Avro Source, File Channel, and HBase Sink, as shown in :ref:`Figure 2 `. + + .. _mrs_01_1067__en-us_topic_0000001219149245_f0496e0941e8448e9b97ce6bcc717406a: + + .. figure:: /_static/images/en-us_image_0000001295740072.png + :alt: **Figure 2** Example for the Flume configuration tool + + **Figure 2** Example for the Flume configuration tool + + #. Double-click the source, channel, and sink. Set corresponding configuration parameters by seeing :ref:`Table 2 ` based on the actual environment. + + .. note:: + + - If the server parameters of the Flume role have been configured, you can choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Instance** on FusionInsight Manager. Then select the corresponding Flume role instance and click the **Download** button behind the **flume.config.file** parameter on the **Instance Configurations** page to obtain the existing server parameter configuration file. Choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool** > **Import**, import the file, and modify the configuration items related to non-encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + - A unique checkpoint directory needs to be configured for each File Channel. + + #. Click **Export** to save the **properties.properties** configuration file to the local. + + .. _mrs_01_1067__en-us_topic_0000001219149245_t7f360fe0854d4254a5b1ccd6dcd951fe: + + .. table:: **Table 2** Parameters to be modified of the Flume role server + + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+=========================================================================================================================================================================================================================================================================================================================================================================================================================================================+=============================================================================================================================================================================================================================================+ + | Name | The value must be unique and cannot be left blank. | test | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | bind | Specifies the IP address to which Avro Source is bound. This parameter cannot be left blank. It must be configured as the IP address that the server configuration file will upload. | 192.168.108.11 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | port | Specifies the ID of the port that the Avro Source monitors. This parameter cannot be left blank. It must be configured as an unused port. | 21154 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | ssl | Specifies whether to enable the SSL authentication. (You are advised to enable this function to ensure security.) | false | + | | | | + | | Only Sources of the Avro type have this configuration item. | | + | | | | + | | - **true** indicates that the function is enabled. | | + | | - **false** indicates that the client authentication function is not enabled. | | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | dataDirs | Specifies the directory for storing buffer data. The run directory is used by default. Configuring multiple directories on disks can improve transmission efficiency. Use commas (,) to separate multiple directories. If the directory is inside the cluster, the **/srv/BigData/hadoop/dataX/flume/data** directory can be used. **dataX** ranges from data1 to dataN. If the directory is outside the cluster, it needs to be independently planned. | /srv/BigData/hadoop/data1/flumeserver/data | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | checkpointDir | Specifies the directory for storing the checkpoint information, which is under the run directory by default. If the directory is inside the cluster, the **/srv/BigData/hadoop/dataX/flume/checkpoint** directory can be used. **dataX** ranges from data1 to dataN. If the directory is outside the cluster, it needs to be independently planned. | /srv/BigData/hadoop/data1/flumeserver/checkpoint | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | transactionCapacity | Specifies the transaction size, that is, the number of events in a transaction that can be processed by the current Channel. The size cannot be smaller than the batchSize of Source. Setting the same size as batchSize is recommended. | 61200 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | table | Specifies the HBase table name. This parameter cannot be left blank. | flume_test | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | columnFamily | Specifies the HBase column family name. This parameter cannot be left blank. | cf | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | Specifies the maximum number of events written to HBase by Flume in a batch. | 61200 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kerberosPrincipal | Specifies the Kerberos authentication user, which is mandatory in security versions. This configuration is required only in security clusters. | flume_hbase | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kerberosKeytab | Specifies the file path for Kerberos authentication, which is mandatory in security versions. This configuration is required only in security clusters. | /opt/test/conf/user.keytab | + | | | | + | | | .. note:: | + | | | | + | | | Obtain the **user.keytab** file from the Kerberos certificate file of the user **flume_hbase**. In addition, ensure that the user who installs and runs the Flume client has the read and write permissions on the **user.keytab** file. | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + b. Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume**. On the displayed page, click the **Flume** role on the **Instance** tab page. + c. Select the Flume role of the node where the configuration file is to be uploaded, choose **Instance Configurations** > **Import** beside the **flume.config.file**, and select the **properties.properties** file. + + .. note:: + + - An independent server configuration file can be uploaded to each Flume instance. + - This step is required for updating the configuration file. Modifying the configuration file on the background is an improper operation because the modification will be overwritten after configuration synchronization. + + d. Click **Save**, and then click **OK**. + e. Click **Finish**. + +4. Verify log transmission. + + a. Go to the directory where the HBase client is installed. + + **cd /**\ *Client installation directory*\ **/ HBase/hbase** + + **kinit flume_hbase** (Enter the password.) + + b. Run the **hbase shell** command to access the HBase client. + + c. Run the **scan 'flume_test'** statement. Logs are written in the HBase column family by line. + + .. code-block:: + + hbase(main):001:0> scan 'flume_test' + ROW COLUMN+CELL + 2017-09-18 16:05:36,394 INFO [hconnection-0x415a3f6a-shared--pool2-t1] ipc.AbstractRpcClient: RPC Server Kerberos principal name for service=ClientService is hbase/hadoop.@ + default4021ff4a-9339-4151-a4d0-00f20807e76d column=cf:pCol, timestamp=1505721909388, value=Welcome to flume + incRow column=cf:iCol, timestamp=1505721909461, value=\x00\x00\x00\x00\x00\x00\x00\x01 + 2 row(s) in 0.3660 seconds diff --git a/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs.rst b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs.rst new file mode 100644 index 0000000..d6683ef --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs.rst @@ -0,0 +1,103 @@ +:original_name: mrs_01_1063.html + +.. _mrs_01_1063: + +Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS +========================================================================= + +Scenario +-------- + +This section describes how to use Flume client to collect static logs from a local PC and save them to the **/flume/test** directory on HDFS. + +.. note:: + + By default, the cluster network environment is secure and the SSL authentication is not enabled during the data transmission process. For details about how to use the encryption mode, see :ref:`Configuring the Encrypted Transmission `. The configuration applies to scenarios where only the Flume is configured, for example, Spooldir Source+Memory Channel+HDFS Sink. + +Prerequisites +------------- + +- The cluster, HDFS, and Flume service have been installed. +- The Flume client has been installed. For details about how to install the client, see :ref:`Installing the Flume Client on Clusters `. +- The network environment of the cluster is secure. +- User **flume_hdfs** has been created, and the HDFS directory and data used for log verification have been authorized to the user. + +Procedure +--------- + +#. On FusionInsight Manager, choose **System** > **Permission > User**, select user **flume_hdfs**, and choose **More** > **Download Authentication Credential** to download the Kerberos certificate file of user **flume_hdfs** and save it to the local host. + +#. Set Flume parameters. + + Use Flume on FusionInsight Manager to configure the Flume role client parameters and generate a configuration file. + + a. Log in to FusionInsight Manager. Choose **Cluster** > **Services** > **Flume** > **Configuration Tool**. + + b. Set **Agent Name** to **client**. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them. + + Use SpoolDir Source, Memory Channel, and HDFS Sink. + + + .. figure:: /_static/images/en-us_image_0000001295900052.png + :alt: **Figure 1** Example for the Flume configuration tool + + **Figure 1** Example for the Flume configuration tool + + c. Double-click the source, channel, and sink. Set corresponding configuration parameters by referring to :ref:`Table 1 ` based on the actual environment. + + .. note:: + + - If you want to continue using the **properties.propretites** file by modifying it, log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services**. On the page that is displayed, choose **Flume**. On the displayed page, click the **Configuration Tool** tab, click **Import**, import the file, and modify the configuration items related to non-encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + + d. .. _mrs_01_1063__ld87a5f43900a41ad8cda390510028ae7: + + Click **Export** to save the **properties.properties** configuration file to the local. + + .. _mrs_01_1063__t3a5e921315234eb6a22b607e40f19e8a: + + .. table:: **Table 1** Parameters to be modified of the Flume role client + + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +========================+================================================================================================================================================================================================================+============================================================================================================================================================================================================================================+ + | Name | The value must be unique and cannot be left blank. | test | + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | spoolDir | Specifies the directory where the file to be collected resides. This parameter cannot be left blank. The directory needs to exist and have the write, read, and execute permissions on the flume running user. | /srv/BigData/hadoop/data1/zb | + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | trackerDir | Specifies the path for storing the metadata of files collected by Flume. | /srv/BigData/hadoop/data1/tracker | + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batch-size | Specifies the number of events that Flume sends in a batch. | 61200 | + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.path | Specifies the HDFS data write directory. This parameter cannot be left blank. | hdfs://hacluster/flume/test | + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.inUsePrefix | Specifies the prefix of the file that is being written to HDFS. | TMP\_ | + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.batchSize | Specifies the maximum number of events that can be written to HDFS once. | 61200 | + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosPrincipal | Specifies the Kerberos authentication user, which is mandatory in security versions. This configuration is required only in security clusters. | flume_hdfs | + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosKeytab | Specifies the keytab file path for Kerberos authentication, which is mandatory in security versions. This configuration is required only in security clusters. | /opt/test/conf/user.keytab | + | | | | + | | | .. note:: | + | | | | + | | | Obtain the **user.keytab** file from the Kerberos certificate file of the user **flume_hdfs**. In addition, ensure that the user who installs and runs the Flume client has the read and write permissions on the **user.keytab** file. | + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.useLocalTimeStamp | Specifies whether to use the local time. Possible values are **true** and **false**. | true | + +------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Upload the configuration file. + + Upload the file exported in :ref:`2.d ` to the *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume component version number*\ **/conf** directory of the cluster + +4. Verify log transmission. + + a. Log in to FusionInsight Manager as a user who has the management permission on HDFS. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > **Services** > **HDFS**. On the page that is displayed, click the **NameNode(**\ *Node name*\ **,Active)** link next to **NameNode WebUI** to go to the HDFS web UI. On the displayed page, choose **Utilities** > **Browse the file system**. + + b. Check whether the data is generated in the **/flume/test** directory on the HDFS. + + + .. figure:: /_static/images/en-us_image_0000001296059892.png + :alt: **Figure 2** Checking HDFS directories and files + + **Figure 2** Checking HDFS directories and files diff --git a/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_kafka.rst b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_kafka.rst new file mode 100644 index 0000000..3a41ab9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_kafka.rst @@ -0,0 +1,96 @@ +:original_name: mrs_01_1061.html + +.. _mrs_01_1061: + +Typical Scenario: Collecting Local Static Logs and Uploading Them to Kafka +========================================================================== + +Scenario +-------- + +This section describes how to use Flume client to collect static logs from a local host and save them to the topic list (test1) of Kafka. + +.. note:: + + By default, the cluster network environment is secure and the SSL authentication is not enabled during the data transmission process. For details about how to use the encryption mode, see :ref:`Configuring the Encrypted Transmission `. The configuration applies to scenarios where only the Flume is configured, for example, Spooldir Source+Memory Channel+Kafka Sink. + +Prerequisites +------------- + +- The cluster, Kafka, and Flume service have been installed. +- The Flume client has been installed. For details about how to install the client, see :ref:`Installing the Flume Client on Clusters `. +- The network environment of the cluster is secure. +- The system administrator has understood service requirements and prepared Kafka administrator **flume_kafka**. + +Procedure +--------- + +#. Set Flume parameters. + + Use the Flume configuration tool on Manager to configure the Flume role client parameters and generate a configuration file. + + a. Log in to FusionInsight Manager. Choose **Cluster** > **Services** > **Flume** > **Configuration Tool**. + + b. Set **Agent Name** to **client**. Select and drag the source, channel, and sink to be used to the GUI on the right, and connect them. + + Use SpoolDir Source, Memory Channel, and Kafka Sink. + + + .. figure:: /_static/images/en-us_image_0000001296219456.png + :alt: **Figure 1** Example for the Flume configuration tool + + **Figure 1** Example for the Flume configuration tool + + c. Double-click the source, channel, and sink. Set corresponding configuration parameters by referring to :ref:`Table 1 ` based on the actual environment. + + .. note:: + + - If you want to continue using the **properties.propretites** file by modifying it, log in to FusionInsight Manager, choose **Cluster** > **Services**. On the page that is displayed, choose **Flume**. On the displayed page, click the **Configuration Tool** tab, click **Import**, import the file, and modify the configuration items related to non-encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + + d. .. _mrs_01_1061__l14d98e844ee849a99592f46d8be65b86: + + Click **Export** to save the **properties.properties** configuration file to the local server. + + .. _mrs_01_1061__taa0f737dd6d64fcb98f67e8333b8c44a: + + .. table:: **Table 1** Parameters to be modified of the Flume role client + + +-------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+ + | Parameter | Description | Example Value | + +=========================+================================================================================================================================================================================================================+===================================+ + | Name | The value must be unique and cannot be left blank. | test | + +-------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+ + | spoolDir | Specifies the directory where the file to be collected resides. This parameter cannot be left blank. The directory needs to exist and have the write, read, and execute permissions on the flume running user. | /srv/BigData/hadoop/data1/zb | + +-------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+ + | trackerDir | Specifies the path for storing the metadata of files collected by Flume. | /srv/BigData/hadoop/data1/tracker | + +-------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+ + | batchSize | Specifies the number of events that Flume sends in a batch (number of data pieces). A larger value indicates higher performance and lower timeliness. | 61200 | + +-------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+ + | kafka.topics | Specifies the list of subscribed Kafka topics, which are separated by commas (,). This parameter cannot be left blank. | test1 | + +-------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+ + | kafka.bootstrap.servers | Specifies the bootstrap IP address and port list of Kafka. The default value is all Kafkabrokers in the Kafka cluster. | 192.168.101.10:21007 | + +-------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+ + +#. Upload the configuration file. + + Upload the file exported in :ref:`1.d ` to the *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume component version number*\ **/conf** directory of the cluster + +3. Verify log transmission. + + a. Log in to the Kafka client. + + **cd** **/**\ *Client installation directory*\ **/Kafka/kafka** + + **kinit flume_kafka** (Enter the password.) + + b. Read data from a Kafka topic. + + **bin/kafka-console-consumer.sh --topic** *topic name* **--bootstrap-server** *Kafka service IP address of the node where the role instance is located*\ **: 21007 --consumer.config config/consumer.properties --from-beginning** + + The system displays the contents of the file to be collected. + + .. code-block:: console + + [root@host1 kafka]# bin/kafka-console-consumer.sh --topic test1 --bootstrap-server 192.168.101.10:21007 --consumer.config config/consumer.properties --from-beginning + Welcome to flume diff --git a/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_logs_from_kafka_and_uploading_them_to_hdfs.rst b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_logs_from_kafka_and_uploading_them_to_hdfs.rst new file mode 100644 index 0000000..f2257f0 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_logs_from_kafka_and_uploading_them_to_hdfs.rst @@ -0,0 +1,105 @@ +:original_name: mrs_01_1065.html + +.. _mrs_01_1065: + +Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS +======================================================================= + +Scenario +-------- + +This section describes how to use Flume client to collect logs from the Topic list (test1) of Kafka and save them to the **/flume/test** directory on HDFS. + +.. note:: + + By default, the cluster network environment is secure and the SSL authentication is not enabled during the data transmission process. For details about how to use the encryption mode, see :ref:`Configuring the Encrypted Transmission `. The configuration applies to scenarios where only the Flume is configured, for example, Kafka Source+Memory Channel+HDFS Sink. + +Prerequisites +------------- + +- The cluster, HDFS, Kafka, and Flume service have been installed. +- The Flume client has been installed. For details about how to install the client, see :ref:`Installing the Flume Client on Clusters `. +- The network environment of the cluster is secure. +- You have created user **flume_hdfs** and authorized the HDFS directory and data to be operated during log verification. For details, see :ref:`Adding a Ranger Access Permission Policy for HDFS `. + +Procedure +--------- + +#. On FusionInsight Manager, choose **System > User** and choose **More > Download Authentication Credential** to download the Kerberos certificate file of user **flume_hdfs** and save it to the local host. + +#. Configure the client parameters of the Flume role. + + Use the Flume configuration tool on FusionInsight Manager to configure the Flume role client parameters and generate a configuration file. + + a. Log in to FusionInsight Manager and choose **Cluster** > **Services**. On the page that is displayed, choose **Flume**. On the displayed page, click the **Configuration Tool** tab. + + b. Set **Agent Name** to **client**. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them. + + For example, use Kafka Source, Memory Channel, and HDFS Sink. + + + .. figure:: /_static/images/en-us_image_0000001295740052.png + :alt: **Figure 1** Example for the Flume configuration tool + + **Figure 1** Example for the Flume configuration tool + + c. Double-click the source, channel, and sink. Set corresponding configuration parameters by seeing :ref:`Table 1 ` based on the actual environment. + + .. note:: + + - If you want to continue using the **properties.propretites** file by modifying it, log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services**. On the page that is displayed, choose **Flume**. On the displayed page, click the **Configuration Tool** tab, click **Import**, import the file, and modify the configuration items related to non-encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + + d. .. _mrs_01_1065__l92b924df515f493daa8ec019ca9fcec4: + + Click **Export** to save the **properties.properties** configuration file to the local. + + .. _mrs_01_1065__t6c3b4afafa084081b9b2d9400d6ea379: + + .. table:: **Table 1** Parameters to be modified of the Flume role client + + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=========================+=================================================================================================================================================================================================================================================+============================================================================================================================================================================================================================================+ + | Name | The value must be unique and cannot be left blank. | test | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.topics | Specifies the subscribed Kafka topic list, in which topics are separated by commas (,). This parameter cannot be left blank. | test1 | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.consumer.group.id | Specifies the data group ID obtained from Kafka. This parameter cannot be left blank. | flume | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.bootstrap.servers | Specifies the bootstrap IP address and port list of Kafka. The default value is all Kafka lists in a Kafka cluster. If Kafka has been installed in the cluster and its configurations have been synchronized, this parameter can be left blank. | 192.168.101.10:9092 | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | Specifies the number of events that Flume sends in a batch (number of data pieces). | 61200 | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.path | Specifies the HDFS data write directory. This parameter cannot be left blank. | hdfs://hacluster/flume/test | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.inUsePrefix | Specifies the prefix of the file that is being written to HDFS. | TMP\_ | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.batchSize | Specifies the maximum number of events that can be written to HDFS once. | 61200 | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosPrincipal | Specifies the Kerberos authentication user, which is mandatory in security versions. This configuration is required only in security clusters. | flume_hdfs | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosKeytab | Specifies the keytab file path for Kerberos authentication, which is mandatory in security versions. This configuration is required only in security clusters. | /opt/test/conf/user.keytab | + | | | | + | | | .. note:: | + | | | | + | | | Obtain the **user.keytab** file from the Kerberos certificate file of the user **flume_hdfs**. In addition, ensure that the user who installs and runs the Flume client has the read and write permissions on the **user.keytab** file. | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.useLocalTimeStamp | Specifies whether to use the local time. Possible values are **true** and **false**. | true | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Upload the configuration file. + + Upload the file exported in :ref:`2.d ` to the *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume component version number*\ **/conf** directory of the cluster + +4. Verify log transmission. + + a. Log in to FusionInsight Manager as a user who has the management permission on HDFS. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > **Services** > **HDFS**. On the page that is displayed, click the **NameNode(**\ *Node name*\ **,Active)** link next to **NameNode WebUI** to go to the HDFS web UI. On the displayed page, choose **Utilities** > **Browse the file system**. + + b. Check whether the data is generated in the **/flume/test** directory on the HDFS. + + + .. figure:: /_static/images/en-us_image_0000001349059705.png + :alt: **Figure 2** Checking HDFS directories and files + + **Figure 2** Checking HDFS directories and files diff --git a/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_logs_from_kafka_and_uploading_them_to_hdfs_through_the_flume_client.rst b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_logs_from_kafka_and_uploading_them_to_hdfs_through_the_flume_client.rst new file mode 100644 index 0000000..f1eeb96 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/non-encrypted_transmission/typical_scenario_collecting_logs_from_kafka_and_uploading_them_to_hdfs_through_the_flume_client.rst @@ -0,0 +1,135 @@ +:original_name: mrs_01_1066.html + +.. _mrs_01_1066: + +Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS Through the Flume Client +================================================================================================ + +Scenario +-------- + +This section describes how to use Flume client to collect logs from the Topic list (test1) of Kafka client and save them to the **/flume/test** directory on HDFS. + +.. note:: + + By default, the cluster network environment is secure and the SSL authentication is not enabled during the data transmission process. For details about how to use the encryption mode, see :ref:`Configuring the Encrypted Transmission `. + +Prerequisites +------------- + +- The cluster, HDFS, Kafka, and Flume service have been installed. +- The Flume client has been installed. For details about how to install the client, see :ref:`Installing the Flume Client on Clusters `. +- You have created user **flume_hdfs** and authorized the HDFS directory and data to be operated during log verification. For details, see :ref:`Adding a Ranger Access Permission Policy for HDFS `. +- The network environment of the cluster is secure. + +Procedure +--------- + +#. On FusionInsight Manager, choose **System > User** and choose **More > Download Authentication Credential** to download the Kerberos certificate file of user **flume_hdfs** and save it to the local host. +#. Configure the client parameters of the Flume role. + + a. Use the Flume configuration tool on FusionInsight Manager to configure the Flume role client parameters and generate a configuration file. + + #. Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool**. + + #. Set **Agent Name** to **client**. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them. + + For example, use Kafka Source, File Channel, and HDFS Sink, as shown in :ref:`Figure 1 `. + + .. _mrs_01_1066__en-us_topic_0000001173471374_fig1526804343314: + + .. figure:: /_static/images/en-us_image_0000001349059513.png + :alt: **Figure 1** Example for the Flume configuration tool + + **Figure 1** Example for the Flume configuration tool + + #. Double-click the source, channel, and sink. Set corresponding configuration parameters by seeing :ref:`Table 1 ` based on the actual environment. + + .. note:: + + - If the client parameters of the Flume role have been configured, you can obtain the existing client parameter configuration file from *client installation directory*\ **/fusioninsight-flume-1.9.0/conf/properties.properties** to ensure that the configuration is in concordance with the previous. Log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Configuration Tool** > **Import**, import the file, and modify the configuration items related to non-encrypted transmission. + - It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long. + + #. Click **Export** to save the **properties.properties** configuration file to the local. + + .. _mrs_01_1066__en-us_topic_0000001173471374_table1127710438338: + + .. table:: **Table 1** Parameters to be modified of the Flume role client + + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=========================+=========================================================================================================================================================================================================================================================================================================================================================================================================================================================+============================================================================================================================================================================================================================================+ + | Name | The value must be unique and cannot be left blank. | test | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.topics | Specifies the subscribed Kafka topic list, in which topics are separated by commas (,). This parameter cannot be left blank. | test1 | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.consumer.group.id | Specifies the data group ID obtained from Kafka. This parameter cannot be left blank. | flume | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka.bootstrap.servers | Specifies the bootstrap IP address and port list of Kafka. The default value is all Kafka lists in a Kafka cluster. If Kafka has been installed in the cluster and its configurations have been synchronized, this parameter can be left blank. | 192.168.101.10:21007 | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | batchSize | Specifies the number of events that Flume sends in a batch (number of data pieces). | 61200 | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | dataDirs | Specifies the directory for storing buffer data. The run directory is used by default. Configuring multiple directories on disks can improve transmission efficiency. Use commas (,) to separate multiple directories. If the directory is inside the cluster, the **/srv/BigData/hadoop/dataX/flume/data** directory can be used. **dataX** ranges from data1 to dataN. If the directory is outside the cluster, it needs to be independently planned. | /srv/BigData/hadoop/data1/flume/data | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | checkpointDir | Specifies the directory for storing the checkpoint information, which is under the run directory by default. If the directory is inside the cluster, the **/srv/BigData/hadoop/dataX/flume/checkpoint** directory can be used. **dataX** ranges from data1 to dataN. If the directory is outside the cluster, it needs to be independently planned. | /srv/BigData/hadoop/data1/flume/checkpoint | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | transactionCapacity | Specifies the transaction size, that is, the number of events in a transaction that can be processed by the current Channel. The size cannot be smaller than the batchSize of Source. Setting the same size as batchSize is recommended. | 61200 | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.path | Specifies the HDFS data write directory. This parameter cannot be left blank. | hdfs://hacluster/flume/test | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.inUsePrefix | Specifies the prefix of the file that is being written to HDFS. | TMP\_ | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.batchSize | Specifies the maximum number of events that can be written to HDFS once. | 61200 | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosPrincipal | Specifies the Kerberos authentication user, which is mandatory in security versions. This configuration is required only in security clusters. | flume_hdfs | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.kerberosKeytab | Specifies the keytab file path for Kerberos authentication, which is mandatory in security versions. This configuration is required only in security clusters. | /opt/test/conf/user.keytab | + | | | | + | | | .. note:: | + | | | | + | | | Obtain the **user.keytab** file from the Kerberos certificate file of the user **flume_hdfs**. In addition, ensure that the user who installs and runs the Flume client has the read and write permissions on the **user.keytab** file. | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs.useLocalTimeStamp | Specifies whether to use the local time. Possible values are **true** and **false**. | true | + +-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + b. Upload the **properties.properties** file to **flume/conf/** under the installation directory of the Flume client. + + c. To connect the Flume client to the HDFS, you need to add the following configuration: + + #. Download the Kerberos certificate of account **flume_hdfs** and obtain the **krb5.conf** configuration file. Upload the configuration file to the **fusioninsight-flume-1.9.0/conf/** directory on the node where the client is installed. + + #. In **fusioninsight-flume-1.9.0/conf/**, create the **jaas.conf** configuration file. + + **vi jaas.conf** + + .. code-block:: + + KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + useKeyTab=true + keyTab="/opt/test/conf/user.keytab" + principal="flume_hdfs@" + useTicketCache=false + storeKey=true + debug=true; + }; + + Values of **keyTab** and **principal** vary depending on the actual situation. + + #. Obtain configuration files **core-site.xml** and **hdfs-site.xml** from **/opt/FusionInsight_Cluster\_\ \ \_Flume_ClientConfig/Flume/config** and upload them to **fusioninsight-flume-1.9.0/conf/**. + + d. Restart the Flume service. + + **flume-manager.sh restart** + +3. Verify log transmission. + + a. Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster >** *Name of the desired cluster* > **Services** > **HDFS**, click the HDFS WebUI link of **NameNode** (*Node name*, **Active**) to go to the HDFS WebUI, and choose **Utilities > Browse the file system**. + + b. Check whether the data is generated in the **/flume/test** directory on the HDFS. + + + .. figure:: /_static/images/en-us_image_0000001296219300.png + :alt: **Figure 2** Checking HDFS directories and files + + **Figure 2** Checking HDFS directories and files diff --git a/doc/component-operation-guide-lts/source/using_flume/overview.rst b/doc/component-operation-guide-lts/source/using_flume/overview.rst new file mode 100644 index 0000000..47e69f0 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/overview.rst @@ -0,0 +1,117 @@ +:original_name: mrs_01_0391.html + +.. _mrs_01_0391: + +Overview +======== + +Flume is a distributed, reliable, and highly available system for aggregating massive logs, which can efficiently collect, aggregate, and move massive log data from different data sources and store the data in a centralized data storage system. Various data senders can be customized in the system to collect data. Additionally, Flume provides simple data processes capabilities and writes data to data receivers (which is customizable). + +Flume consists of the client and server, both of which are FlumeAgents. The server corresponds to the FlumeServer instance and is directly deployed in a cluster. The client can be deployed inside or outside the cluster. he client-side and service-side FlumeAgents work independently and provide the same functions. + +The client-side FlumeAgent needs to be independently installed. Data can be directly imported to components such as HDFS and Kafka. Additionally, the client-side and service-side FlumeAgents can also work together to provide services. + +Process +------- + +The process for collecting logs using Flume is as follows: + +#. Installing the flume client +#. Configuring the Flume server and client parameters +#. Collecting and querying logs using the Flume client +#. Stopping and uninstalling the Flume client + + +.. figure:: /_static/images/en-us_image_0000001296060128.png + :alt: **Figure 1** Log collection process + + **Figure 1** Log collection process + +Flume Client +------------ + +A Flume client consists of the source, channel, and sink. The source sends the data to the channel, and then the sink transmits the data from the channel to the external device. :ref:`Table 1 ` describes Flume modules. + +.. _mrs_01_0391__en-us_topic_0000001173789528_t3f29550548a749a4831f4ddfc95df002: + +.. table:: **Table 1** Module description + + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Name | Description | + +===================================+=============================================================================================================================================================================+ + | Source | A source receives or generates data and sends the data to one or multiple channels. The source can work in either data-driven or polling mode. | + | | | + | | Typical sources include: | + | | | + | | - Sources that are integrated with the system and receives data, such as Syslog and Netcat | + | | - Sources that automatically generate event data, such as Exec and SEQ | + | | - IPC sources that are used for communication between agents, such as Avro | + | | | + | | A Source must associate with at least one channel. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Channel | A channel is used to buffer data between a source and a sink. After the sink transmits the data to the next channel or the destination, the cache is deleted automatically. | + | | | + | | The persistency of the channels varies with the channel types: | + | | | + | | - Memory channel: non-persistency | + | | - File channel: persistency implemented based on write-ahead logging (WAL) | + | | - JDBC channel: persistency implemented based on the embedded database | + | | | + | | Channels support the transaction feature to ensure simple sequential operations. A channel can work with sources and sinks of any quantity. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Sink | Sink is responsible for sending data to the next hop or final destination and removing the data from the channel after successfully sending the data. | + | | | + | | Typical sinks include: | + | | | + | | - Sinks that send storage data to the final destination, such as HDFS and Kafka | + | | - Sinks that are consumed automatically, such as Null Sink | + | | - IPC sinks that are used for communication between agents, such as Avro | + | | | + | | A sink must associate with at least one channel. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +A Flume client can have multiple sources, channels, and sinks. A source can send data to multiple channels, and then multiple sinks send the data out of the client. + +Multiple Flume clients can be cascaded. That is, a sink can send data to the source of another client. + +Supplementary Information +------------------------- + +#. Flume provides the following reliability measures: + + - The transaction mechanism is implemented between sources and channels, and between channels and sinks. + + - The sink processor supports the failover and load balancing (load_balance) mechanisms. + + The following is an example of the load balancing (load_balance) configuration: + + .. code-block:: + + server.sinkgroups=g1 + server.sinkgroups.g1.sinks=k1 k2 + server.sinkgroups.g1.processor.type=load_balance + server.sinkgroups.g1.processor.backoff=true + server.sinkgroups.g1.processor.selector=random + +#. The following are precautions for the aggregation and cascading of multiple Flume clients: + + - Avro or Thrift protocol can be used for cascading. + - When the aggregation end contains multiple nodes, evenly distribute the clients to these nodes. Do not connect all the clients to a single node. + +#. The Flume client can contain multiple independent data flows. That is, multiple sources, channels, and sinks can be configured in the **properties.properties** configuration file. These components can be linked to form multiple flows. + + For example, to configure two data flows in a configuration, run the following commands: + + .. code-block:: + + server.sources = source1 source2 + server.sinks = sink1 sink2 + server.channels = channel1 channel2 + + #dataflow1 + server.sources.source1.channels = channel1 + server.sinks.sink1.channel = channel1 + + #dataflow2 + server.sources.source2.channels = channel2 + server.sinks.sink2.channel = channel2 diff --git a/doc/component-operation-guide-lts/source/using_flume/secondary_development_guide_for_flume_third-party_plug-ins.rst b/doc/component-operation-guide-lts/source/using_flume/secondary_development_guide_for_flume_third-party_plug-ins.rst new file mode 100644 index 0000000..9fe85b5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/secondary_development_guide_for_flume_third-party_plug-ins.rst @@ -0,0 +1,50 @@ +:original_name: mrs_01_1083.html + +.. _mrs_01_1083: + +Secondary Development Guide for Flume Third-Party Plug-ins +========================================================== + +Scenario +-------- + +This section describes how to perform secondary development for third-party plug-ins. + +Prerequisites +------------- + +- You have obtained the third-party JAR package. + +- You have installed Flume server or client. + +Procedure +--------- + +#. Compress the self-developed code into a JAR package. + +#. Create a directory for the plug-in. + + a. Access the **$FLUME_HOME/plugins.d** path and run the following command to create a directory: + + **mkdir thirdPlugin** + + **cd thirdPlugin** + + **mkdir lib libext** **native** + + The command output is displayed as follows: + + |image1| + + b. Place the third-party JAR package in the **$FLUME_HOME/plugins.d/thirdPlugin/lib** directory. If the JAR package depends on other JAR packages, place the depended JAR packages to the **$FLUME_HOME/ plugins.d/ thirdPlugin/libext** directory, and place the local library files in **$FLUME_HOME/ plugins.d/ thirdPlugin/native**. + +#. Configure the **properties.properties** file in **$FLUME_HOME/conf/**. + + For details about how to set parameters in the **properties.properties** file, see the parameter list in the **properties.properties** file in the corresponding typical scenario :ref:`Non-Encrypted Transmission ` and :ref:`Encrypted Transmission `. + + .. note:: + + - **$FLUME_HOME** indicates the Flume installation path. Set this parameter based on the site requirements (server or client) when configuring third-party plug-ins. + - **thirdPlugin** is the name of the third-party plugin. + +.. |image1| image:: /_static/images/en-us_image_0000001441209301.png diff --git a/doc/component-operation-guide-lts/source/using_flume/stopping_or_uninstalling_the_flume_client.rst b/doc/component-operation-guide-lts/source/using_flume/stopping_or_uninstalling_the_flume_client.rst new file mode 100644 index 0000000..7029045 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/stopping_or_uninstalling_the_flume_client.rst @@ -0,0 +1,46 @@ +:original_name: mrs_01_0394.html + +.. _mrs_01_0394: + +Stopping or Uninstalling the Flume Client +========================================= + +Scenario +-------- + +You can stop and start the Flume client or uninstall the Flume client when the Flume data ingestion channel is not required. + +Procedure +--------- + +- Stop the Flume client of the Flume role. + + Assume that the Flume client installation path is **/opt/FlumeClient**. Run the following command to stop the Flume client: + + **cd /opt/FlumeClient/fusioninsight-flume-**\ *Flume component version number*\ **/bin** + + **./flume-manage.sh stop** + + If the following information is displayed after the command execution, the Flume client is successfully stopped. + + .. code-block:: + + Stop Flume PID=120689 successful.. + + .. note:: + + The Flume client will be automatically restarted after being stopped. If you do not need automatic restart, run the following command: + + **./flume-manage.sh stop force** + + If you want to restart the Flume client, run the following command: + + **./flume-manage.sh start force** + +- Uninstall the Flume client of the Flume role. + + Assume that the Flume client installation path is **/opt/FlumeClient**. Run the following command to uninstall the Flume client: + + **cd /opt/FlumeClient/fusioninsight-flume-**\ *Flume component version number*\ **/inst** + + **./uninstall.sh** diff --git a/doc/component-operation-guide-lts/source/using_flume/using_environment_variables_in_the_properties.properties_file.rst b/doc/component-operation-guide-lts/source/using_flume/using_environment_variables_in_the_properties.properties_file.rst new file mode 100644 index 0000000..80db53a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/using_environment_variables_in_the_properties.properties_file.rst @@ -0,0 +1,69 @@ +:original_name: mrs_01_1058.html + +.. _mrs_01_1058: + +Using Environment Variables in the **properties.properties** File +================================================================= + +Scenario +-------- + +This section describes how to use environment variables in the **properties.properties** configuration file. + +Prerequisites +------------- + +The Flume service is running properly and the Flume client has been installed. + +Procedure +--------- + +#. Log in to the node where the Flume client is installed as user **root**. + +#. Switch to the following directory: + + **cd** *Flume client installation directory*/**fusioninsight-flume**\ ``-``\ *Flume component version*/**conf** + +#. Add environment variables to the **flume-env.sh** file in the directory. + + - Format: + + .. code-block:: + + export Variable name=Variable value + + - Example: + + .. code-block:: + + JAVA_OPTS="-Xms2G -Xmx4G -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -DpropertiesImplementation=org.apache.flume.node.EnvVarResolverProperties" + export TAILDIR_PATH=/tmp/flumetest/201907/20190703/1/.*log.* + +#. Restart the Flume instance process. + + a. Log in to FusionInsight Manager. + b. Choose **Cluster** > **Services** > **Flume**. On the page that is displayed, click the **Instance** tab, select all Flume instances, and choose **More** > **Restart Instance**. In the displayed **Verify Identity** dialog box, enter the password, and click **OK**. + + .. important:: + + Do not restart the Flume service on FusionInsight Manager after **flume-env.sh** takes effect on the server. Otherwise, the user-defined environment variables will lost. You only need to restart the corresponding instances on FusionInsight Manager. + +#. .. _mrs_01_1058__li17459142018584: + + In the *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume component version number*\ **/conf/properties.properties** configuration file, reference variables in the **${**\ *Variable name*\ **}** format. The following is an example: + + .. code-block:: + + client.sources.s1.type = TAILDIR + client.sources.s1.filegroups = f1 + client.sources.s1.filegroups.f1 = ${TAILDIR_PATH} + client.sources.s1.positionFile = /tmp/flumetest/201907/20190703/1/taildir_position.json + client.sources.s1.channels = c1 + + .. important:: + + - Ensure that **flume-env.sh** takes effect before you go to :ref:`5 ` to configure the **properties.properties** file. + - If you configure file on the local host, upload the file on FusionInsight Manager by performing the following steps. The user-defined environment variables may be lost if the operations are not performed in the correct sequence. + + a. Log in to FusionInsight Manager. + b. Choose **Cluster** > **Services** > **Flume**. On the page that is displayed, click the **Configurations** tab, select the Flume instance, and click **Upload File** next to **flume.config.file** to upload the **properties.properties** file. diff --git a/doc/component-operation-guide-lts/source/using_flume/using_flume_from_scratch.rst b/doc/component-operation-guide-lts/source/using_flume/using_flume_from_scratch.rst new file mode 100644 index 0000000..90542b4 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/using_flume_from_scratch.rst @@ -0,0 +1,158 @@ +:original_name: mrs_01_0397.html + +.. _mrs_01_0397: + +Using Flume from Scratch +======================== + +Scenario +-------- + +You can use Flume to import collected log information to Kafka. + +Prerequisites +------------- + +- A streaming cluster with Kerberos authentication enabled has been created. +- The Flume client has been installed on the node where logs are generated, for example, **/opt/Flumeclient**. The client directory in the following operations is only an example. Change it to the actual installation directory. +- The streaming cluster can properly communicate with the node where logs are generated. + +Using the Flume Client +---------------------- + +.. note:: + + You do not need to perform :ref:`2 ` to :ref:`6 ` for a normal cluster. + +#. Install the client. + + For details, see :ref:`Installing the Flume Client on Clusters `. + +#. .. _mrs_01_0397__en-us_topic_0000001173789216_li81278495417: + + Copy the configuration file of the authentication server from the Master1 node to the *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume component version number*\ **/conf** directory on the node where the Flume client resides. + + The full file path is ${BIGDATA_HOME}/FusionInsight_BASE\_\ *XXX*/1_X_KerberosClient/etc/kdc.conf. In the preceding path, **XXX** indicates the product version number. **X** indicates a random number. Change it based on the site requirements. The file must be saved by the user who installs the Flume client, for example, user **root**. + +#. Check the service IP address of any node where the Flume role is deployed. + + Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster > Services > Flume > Instance**. Check the service IP address of any node where the Flume role is deployed. + +#. .. _mrs_01_0397__en-us_topic_0000001173789216_li4130849748: + + Copy the user authentication file from this node to the *Flume client installation directory*\ **/fusioninsight-flume-Flume component version number/conf** directory on the Flume client node. + + The full file path is ${BIGDATA_HOME}/FusionInsight_Porter\_\ *XXX*/install/FusionInsight-Flume-Flume component version number/flume/conf/flume.keytab. + + In the preceding paths, **XXX** indicates the product version number. Change it based on the site requirements. The file must be saved by the user who installs the Flume client, for example, user **root**. + +#. Copy the **jaas.conf** file from this node to the **conf** directory on the Flume client node. + + The full file path is ${BIGDATA_HOME}/FusionInsight_Current/1\_\ *X*\ \_Flume/etc/jaas.conf. + + In the preceding path, **X** indicates a random number. Change it based on the site requirements. The file must be saved by the user who installs the Flume client, for example, user **root**. + +#. .. _mrs_01_0397__en-us_topic_0000001173789216_li31329494415: + + Log in to the Flume client node and go to the client installation directory. Run the following command to modify the file: + + **vi conf/jaas.conf** + + Change the full path of the user authentication file defined by **keyTab** to the **Flume client installation directory/fusioninsight-flume-*Flume component version number*/conf** saved in :ref:`4 `, and save the modification and exit. + +#. Run the following command to modify the **flume-env.sh** configuration file of the Flume client: + + **vi** *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume component version number*\ **/conf/flume-env.sh** + + Add the following information after **-XX:+UseCMSCompactAtFullCollection**: + + .. code-block:: + + -Djava.security.krb5.conf=Flume client installation directory/fusioninsight-flume-1.9.0/conf/kdc.conf -Djava.security.auth.login.config=Flume client installation directory/fusioninsight-flume-1.9.0/conf/jaas.conf -Dzookeeper.request.timeout=120000 + + For example, **"-XX:+UseCMSCompactAtFullCollection -Djava.security.krb5.conf=\ Flume client installation directory/fusioninsight-flume-*Flume component version number*/conf/kdc.conf -Djava.security.auth.login.config=**\ *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume component version number*\ **/conf/jaas.conf -Dzookeeper.request.timeout=120000"** + + Change *Flume client installation directory* to the actual installation directory. Then save and exit. + +#. Run the following command to restart the Flume client: + + **cd** *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume component version number*\ **/bin** + + **./flume-manage.sh restart** + + Example: + + **cd /opt/FlumeClient/fusioninsight-flume-**\ *Flume component version number*\ **/bin** + + **./flume-manage.sh restart** + +#. Configure jobs based on actual service scenarios. + + - Some parameters can be configured for MRS 3.\ *x* or later on Manager. For details, see :ref:`Non-Encrypted Transmission ` or :ref:`Encrypted Transmission `. + + - Set the parameters in the **properties.properties** file. The following uses SpoolDir Source+File Channel+Kafka Sink as an example. + + Run the following command on the node where the Flume client is installed to configure and save a job in **properties.properties** (Flume client configuration file) based on service requirements by referring to :ref:`Flume Service Configuration Guide `: + + **vi** *Flume client installation directory*\ **/fusioninsight-flume-**\ *Flume component version number*\ **/conf/properties.properties** + + .. code-block:: + + ######################################################################################### + client.sources = static_log_source + client.channels = static_log_channel + client.sinks = kafka_sink + ######################################################################################### + #LOG_TO_HDFS_ONLINE_1 + + client.sources.static_log_source.type = spooldir + client.sources.static_log_source.spoolDir = Monitoring directory + client.sources.static_log_source.fileSuffix = .COMPLETED + client.sources.static_log_source.ignorePattern = ^$ + client.sources.static_log_source.trackerDir = Metadata storage path during transmission + client.sources.static_log_source.maxBlobLength = 16384 + client.sources.static_log_source.batchSize = 51200 + client.sources.static_log_source.inputCharset = UTF-8 + client.sources.static_log_source.deserializer = LINE + client.sources.static_log_source.selector.type = replicating + client.sources.static_log_source.fileHeaderKey = file + client.sources.static_log_source.fileHeader = false + client.sources.static_log_source.basenameHeader = true + client.sources.static_log_source.basenameHeaderKey = basename + client.sources.static_log_source.deletePolicy = never + + client.channels.static_log_channel.type = file + client.channels.static_log_channel.dataDirs = Data cache path. Multiple paths, separated by commas (,), can be configured to improve performance. + client.channels.static_log_channel.checkpointDir = Checkpoint storage path + client.channels.static_log_channel.maxFileSize = 2146435071 + client.channels.static_log_channel.capacity = 1000000 + client.channels.static_log_channel.transactionCapacity = 612000 + client.channels.static_log_channel.minimumRequiredSpace = 524288000 + + client.sinks.kafka_sink.type = org.apache.flume.sink.kafka.KafkaSink + client.sinks.kafka_sink.kafka.topic = Topic to which data is written, for example, flume_test + client.sinks.kafka_sink.kafka.bootstrap.servers = XXX.XXX.XXX.XXX:Kafka port number,XXX.XXX.XXX.XXX:Kafka port number,XXX.XXX.XXX.XXX:Kafka port number + client.sinks.kafka_sink.flumeBatchSize = 1000 + client.sinks.kafka_sink.kafka.producer.type = sync + client.sinks.kafka_sink.kafka.security.protocol = SASL_PLAINTEXT + client.sinks.kafka_sink.kafka.kerberos.domain.name = Kafka domain name. This parameter is mandatory for a security cluster, for example, hadoop.xxx.com. + client.sinks.kafka_sink.requiredAcks = 0 + + client.sources.static_log_source.channels = static_log_channel + client.sinks.kafka_sink.channel = static_log_channel + + .. note:: + + - **client.sinks.kafka_sink.kafka.topic**: Topic to which data is written. If the topic does not exist in Kafka, it is automatically created by default. + + - **client.sinks.kafka_sink.kafka.bootstrap.servers**: List of Kafka Brokers, which are separated by commas (,). By default, the port is **21007** for a security cluster and **9092** for a normal cluster. + + - **client.sinks.kafka_sink.kafka.security.protocol**: The value is **SASL_PLAINTEXT** for a security cluster and **PLAINTEXT** for a normal cluster. + + - **client.sinks.kafka_sink.kafka.kerberos.domain.name**: + + You do not need to set this parameter for a normal cluster. For a security cluster, the value of this parameter is the value of **kerberos.domain.name** in the Kafka cluster. + + In the preceding paths, **X** indicates a random number. Change it based on site requirements. The file must be saved by the user who installs the Flume client, for example, user **root**. + +#. After the parameters are set and saved, the Flume client automatically loads the content configured in **properties.properties**. When new log files are generated by spoolDir, the files are sent to Kafka producers and can be consumed by Kafka consumers. For details, see :ref:`Managing Messages in Kafka Topics `. diff --git a/doc/component-operation-guide-lts/source/using_flume/using_the_encryption_tool_of_the_flume_client.rst b/doc/component-operation-guide-lts/source/using_flume/using_the_encryption_tool_of_the_flume_client.rst new file mode 100644 index 0000000..6cabe6e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/using_the_encryption_tool_of_the_flume_client.rst @@ -0,0 +1,39 @@ +:original_name: mrs_01_0395.html + +.. _mrs_01_0395: + +Using the Encryption Tool of the Flume Client +============================================= + +Scenario +-------- + +You can use the encryption tool provided by the Flume client to encrypt some parameter values in the configuration file. + +Prerequisites +------------- + +The Flume client has been installed. + +Procedure +--------- + +#. Log in to the Flume client node and go to the client installation directory, for example, **/opt/FlumeClient**. + +#. Run the following command to switch the directory: + + **cd fusioninsight-flume-**\ *Flume component version number*\ **/bin** + +#. Run the following command to encrypt information: + + **./genPwFile.sh** + + Input the information that you want to encrypt twice. + +#. Run the following command to query the encrypted information: + + **cat password.property** + + .. note:: + + If the encryption parameter is used for the Flume server, you need to perform encryption on the corresponding Flume server node. You need to run the encryption script as user **omm** for encryption. diff --git a/doc/component-operation-guide-lts/source/using_flume/viewing_flume_client_logs.rst b/doc/component-operation-guide-lts/source/using_flume/viewing_flume_client_logs.rst new file mode 100644 index 0000000..21b9e58 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/viewing_flume_client_logs.rst @@ -0,0 +1,55 @@ +:original_name: mrs_01_0393.html + +.. _mrs_01_0393: + +Viewing Flume Client Logs +========================= + +Scenario +-------- + +You can view logs to locate faults. + +Prerequisites +------------- + +The Flume client has been installed. + +Procedure +--------- + +#. Go to the Flume client log directory (**/var/log/Bigdata** by default). + +#. Run the following command to view the log file: + + **ls -lR flume-client-\*** + + A log file is shown as follows: + + .. code-block:: + + flume-client-1/flume: + total 7672 + -rw-------. 1 root root 0 Sep 8 19:43 Flume-audit.log + -rw-------. 1 root root 1562037 Sep 11 06:05 FlumeClient.2017-09-11_04-05-09.[1].log.zip + -rw-------. 1 root root 6127274 Sep 11 14:47 FlumeClient.log + -rw-------. 1 root root 2935 Sep 8 22:20 flume-root-20170908202009-pid72456-gc.log.0.current + -rw-------. 1 root root 2935 Sep 8 22:27 flume-root-20170908202634-pid78789-gc.log.0.current + -rw-------. 1 root root 4382 Sep 8 22:47 flume-root-20170908203137-pid84925-gc.log.0.current + -rw-------. 1 root root 4390 Sep 8 23:46 flume-root-20170908204918-pid103920-gc.log.0.current + -rw-------. 1 root root 3196 Sep 9 10:12 flume-root-20170908215351-pid44372-gc.log.0.current + -rw-------. 1 root root 2935 Sep 9 10:13 flume-root-20170909101233-pid55119-gc.log.0.current + -rw-------. 1 root root 6441 Sep 9 11:10 flume-root-20170909101631-pid59301-gc.log.0.current + -rw-------. 1 root root 0 Sep 9 11:10 flume-root-20170909111009-pid119477-gc.log.0.current + -rw-------. 1 root root 92896 Sep 11 13:24 flume-root-20170909111126-pid120689-gc.log.0.current + -rw-------. 1 root root 5588 Sep 11 14:46 flume-root-20170911132445-pid42259-gc.log.0.current + -rw-------. 1 root root 2576 Sep 11 13:24 prestartDetail.log + -rw-------. 1 root root 3303 Sep 11 13:24 startDetail.log + -rw-------. 1 root root 1253 Sep 11 13:24 stopDetail.log + + flume-client-1/monitor: + total 8 + -rw-------. 1 root root 141 Sep 8 19:43 flumeMonitorChecker.log + -rw-------. 1 root root 2946 Sep 11 13:24 flumeMonitor.log + + In the log file, **FlumeClient.log** is the run log of the Flume client. diff --git a/doc/component-operation-guide-lts/source/using_flume/viewing_flume_client_monitoring_information.rst b/doc/component-operation-guide-lts/source/using_flume/viewing_flume_client_monitoring_information.rst new file mode 100644 index 0000000..ac0df88 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flume/viewing_flume_client_monitoring_information.rst @@ -0,0 +1,19 @@ +:original_name: mrs_01_1596.html + +.. _mrs_01_1596: + +Viewing Flume Client Monitoring Information +=========================================== + +Scenario +-------- + +The Flume client outside the FusionInsight cluster is a part of the end-to-end data collection. Both the Flume client outside the cluster and the Flume server in the cluster need to be monitored. Users can use FusionInsight Manager to monitor the Flume client and view the monitoring indicators of the Source, Sink, and Channel of the client as well as the client process status. + +Procedure +--------- + +#. Log in to FusionInsight Manager. +#. Choose **Cluster** > *Name of the desired cluster* > **Services** > **Flume** > **Flume Management** to view the current Flume client list and process status. +#. Click the **Instance ID**, and view client monitoring metrics in the **Current** area. +#. Click **History**. The page for querying historical monitoring data is displayed. Select a time range and click **View** to view the monitoring data within the time range. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_deal_with_the_restrictions_of_the_phoenix_bulkload_tool.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_deal_with_the_restrictions_of_the_phoenix_bulkload_tool.rst new file mode 100644 index 0000000..26b20f4 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_deal_with_the_restrictions_of_the_phoenix_bulkload_tool.rst @@ -0,0 +1,72 @@ +:original_name: mrs_01_2211.html + +.. _mrs_01_2211: + +How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool? +================================================================= + +Question +-------- + +When the indexed field data is updated, if a batch of data exists in the user table, the BulkLoad tool cannot update the global and partial mutable indexes. + +Answer +------ + +**Problem Analysis** + +#. Create a table. + + .. code-block:: + + CREATE TABLE TEST_TABLE( + DATE varchar not null, + NUM integer not null, + SEQ_NUM integer not null, + ACCOUNT1 varchar not null, + ACCOUNTDES varchar, + FLAG varchar, + SALL double, + CONSTRAINT PK PRIMARY KEY (DATE,NUM,SEQ_NUM,ACCOUNT1) + ); + +#. Create a global index. + + **CREATE INDEX TEST_TABLE_INDEX ON TEST_TABLE(ACCOUNT1,DATE,NUM,ACCOUNTDES,SEQ_NUM)**; + +#. Insert data. + + **UPSERT INTO TEST_TABLE (DATE,NUM,SEQ_NUM,ACCOUNT1,ACCOUNTDES,FLAG,SALL) values ('20201001',30201001,13,'367392332','sffa1','','');** + +#. Execute the BulkLoad task to update data. + + **hbase org.apache.phoenix.mapreduce.CsvBulkLoadTool -t TEST_TABLE -i /tmp/test.csv**, where the content of **test.csv** is as follows: + + ======== ======== == ========= ======= ======= == + 20201001 30201001 13 367392332 sffa888 1231243 23 + ======== ======== == ========= ======= ======= == + +#. Symptom: The existing index data cannot be directly updated. As a result, two pieces of index data exist. + + .. code-block:: + + +------------+-----------+-----------+---------------+----------------+ + | :ACCOUNT1 | :DATE | :NUM | 0:ACCOUNTDES | :SEQ_NUM | + +------------+-----------+-----------+---------------+----------------+ + | 367392332 | 20201001 | 30201001 | sffa1 | 13 | + | 367392332 | 20201001 | 30201001 | sffa888 | 13 | + +------------+-----------+-----------+---------------+----------------+ + +**Solution** + +#. Delete the old index table. + + **DROP INDEX TEST_TABLE_INDEX ON TEST_TABLE;** + +#. Create an index table in asynchronous mode. + + **CREATE INDEX TEST_TABLE_INDEX ON TEST_TABLE(ACCOUNT1,DATE,NUM,ACCOUNTDES,SEQ_NUM) ASYNC;** + +#. Recreate a index. + + **hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table TEST_TABLE --index-table TEST_TABLE_INDEX --output-path /user/test_table** diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_delete_residual_table_names_in_the__hbase_table-lock_directory_of_zookeeper.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_delete_residual_table_names_in_the__hbase_table-lock_directory_of_zookeeper.rst new file mode 100644 index 0000000..eaf4801 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_delete_residual_table_names_in_the__hbase_table-lock_directory_of_zookeeper.rst @@ -0,0 +1,23 @@ +:original_name: mrs_01_1652.html + +.. _mrs_01_1652: + +How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper? +===================================================================================== + +Question +-------- + +In security mode, names of tables that failed to be created are unnecessarily retained in the table-lock node (default directory is /hbase/table-lock) of ZooKeeper. How do I delete these residual table names? + +Answer +------ + +Perform the following steps: + +#. On the client, run the kinit command as the hbase user to obtain a security certificate. +#. Run the **hbase zkcli** command to launch the ZooKeeper Command Line Interface (zkCLI). +#. Run the **ls /hbase/table** command on the zkCLI to check whether the table name of the table that fails to be created exists. + + - If the table name exists, no further operation is required. + - If the table name does not exist, run **ls /hbase/table-lock** to check whether the table name of the table fail to be created exist. If the table name exists, run the **delete /hbase/table-lock/** command to delete the table name. In the **delete /hbase/table-lock/
** command, **
** indicates the residual table name. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_fix_region_overlapping.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_fix_region_overlapping.rst new file mode 100644 index 0000000..0b9eb32 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_fix_region_overlapping.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_1660.html + +.. _mrs_01_1660: + +How Do I Fix Region Overlapping? +================================ + +Question +-------- + +When the HBaseFsck tool is used to check the region status, if the log contains **ERROR: (regions region1 and region2) There is an overlap in the region chain** or **ERROR: (region region1) Multiple regions have the same startkey: xxx**, overlapping exists in some regions. How do I solve this problem? + +Answer +------ + +To rectify the fault, perform the following steps: + +#. .. _mrs_01_1660__en-us_topic_0000001173949842_l57959cf11dc74b388d62a55b172f9fa6: + + Run the **hbase hbck -repair** *tableName* command to restore the table that contains overlapping. + +#. Run the **hbase hbck** *tableName* command to check whether overlapping exists in the restored table. + + - If overlapping does not exist, go to :ref:`3 `. + - If overlapping exists, go to :ref:`1 `. + +#. .. _mrs_01_1660__en-us_topic_0000001173949842_lc78ee31171b54bc988743bab2a08bbc9: + + Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **HBase** > **More** > **Perform HMaster Switchover** to complete the HMaster active/standby switchover. + +#. Run the **hbase hbck** *tableName* command to check whether overlapping exists in the restored table. + + - If overlapping does not exist, no further action is required. + - If overlapping still exists, start from :ref:`1 ` to perform the recovery again. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_restore_a_region_in_the_rit_state_for_a_long_time.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_restore_a_region_in_the_rit_state_for_a_long_time.rst new file mode 100644 index 0000000..096fc29 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/how_do_i_restore_a_region_in_the_rit_state_for_a_long_time.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_1644.html + +.. _mrs_01_1644: + +How Do I Restore a Region in the RIT State for a Long Time? +=========================================================== + +Question +-------- + +How do I restore a region in the RIT state for a long time? + +Answer +------ + +Log in to the HMaster WebUI, choose **Procedure & Locks** in the navigation tree, and check whether any process ID is in the **Waiting** state. If yes, run the following command to release the procedure lock: + +**hbase hbck -j /opt/client/HBase/hbase/tools/hbase-hbck2-*.jar bypass -o** *pid* + +Check whether the state is in the **Bypass** state. If the procedure on the UI is always in **RUNNABLE(Bypass)** state, perform an active/standby switchover. Run the **assigns** command to bring the region online again. + +**hbase hbck -j /opt/client/HBase/hbase/tools/hbase-hbck2-*.jar assigns -o** *regionName* diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/index.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/index.rst new file mode 100644 index 0000000..e50b3cc --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/index.rst @@ -0,0 +1,62 @@ +:original_name: mrs_01_1638.html + +.. _mrs_01_1638: + +Common Issues About HBase +========================= + +- :ref:`Why Does a Client Keep Failing to Connect to a Server for a Long Time? ` +- :ref:`Operation Failures Occur in Stopping BulkLoad On the Client ` +- :ref:`Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively? ` +- :ref:`Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port? ` +- :ref:`Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail? ` +- :ref:`How Do I Restore a Region in the RIT State for a Long Time? ` +- :ref:`Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online? ` +- :ref:`Why Does SocketTimeoutException Occur When a Client Queries HBase? ` +- :ref:`Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command? ` +- :ref:`Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell? ` +- :ref:`When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared? ` +- :ref:`Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload? ` +- :ref:`What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions? ` +- :ref:`How Do I Delete Residual Table Names in the /hbase/table-lock Directory of ZooKeeper? ` +- :ref:`Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS? ` +- :ref:`Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed ` +- :ref:`Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process? ` +- :ref:`Insufficient Rights When a Tenant Accesses Phoenix ` +- :ref:`What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"? ` +- :ref:`How Do I Fix Region Overlapping? ` +- :ref:`Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB? ` +- :ref:`Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches? ` +- :ref:`Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used? ` +- :ref:`How Do I Deal with the Restrictions of the Phoenix BulkLoad Tool? ` +- :ref:`Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins? ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + why_does_a_client_keep_failing_to_connect_to_a_server_for_a_long_time + operation_failures_occur_in_stopping_bulkload_on_the_client + why_may_a_table_creation_exception_occur_when_hbase_deletes_or_creates_the_same_table_consecutively + why_other_services_become_unstable_if_hbase_sets_up_a_large_number_of_connections_over_the_network_port + why_does_the_hbase_bulkload_task_one_table_has_26_tb_data_consisting_of_210,000_map_tasks_and_10,000_reduce_tasks_fail + how_do_i_restore_a_region_in_the_rit_state_for_a_long_time + why_does_hmaster_exits_due_to_timeout_when_waiting_for_the_namespace_table_to_go_online + why_does_sockettimeoutexception_occur_when_a_client_queries_hbase + why_modified_and_deleted_data_can_still_be_queried_by_using_the_scan_command + why_java.lang.unsatisfiedlinkerror_permission_denied_exception_thrown_while_starting_hbase_shell + when_does_the_regionservers_listed_under_dead_region_servers_on_hmaster_webui_gets_cleared + why_are_different_query_results_returned_after_i_use_same_query_criteria_to_query_data_successfully_imported_by_hbase_bulkload + what_should_i_do_if_i_fail_to_create_tables_due_to_the_failed_open_state_of_regions + how_do_i_delete_residual_table_names_in_the__hbase_table-lock_directory_of_zookeeper + why_does_hbase_become_faulty_when_i_set_a_quota_for_the_directory_used_by_hbase_in_hdfs + why_hmaster_times_out_while_waiting_for_namespace_table_to_be_assigned_after_rebuilding_meta_using_offlinemetarepair_tool_and_startups_failed + why_messages_containing_filenotfoundexception_and_no_lease_are_frequently_displayed_in_the_hmaster_logs_during_the_wal_splitting_process + insufficient_rights_when_a_tenant_accesses_phoenix + what_can_i_do_when_hbase_fails_to_recover_a_task_and_a_message_is_displayed_stating_rollback_recovery_failed + how_do_i_fix_region_overlapping + why_does_regionserver_fail_to_be_started_when_gc_parameters_xms_and_xmx_of_hbase_regionserver_are_set_to_31_gb + why_does_the_loadincrementalhfiles_tool_fail_to_be_executed_and_permission_denied_is_displayed_when_nodes_in_a_cluster_are_used_to_import_data_in_batches + why_is_the_error_message_import_argparse_displayed_when_the_phoenix_sqlline_script_is_used + how_do_i_deal_with_the_restrictions_of_the_phoenix_bulkload_tool + why_a_message_is_displayed_indicating_that_the_permission_is_insufficient_when_ctbase_connects_to_the_ranger_plug-ins diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/insufficient_rights_when_a_tenant_accesses_phoenix.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/insufficient_rights_when_a_tenant_accesses_phoenix.rst new file mode 100644 index 0000000..131fb04 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/insufficient_rights_when_a_tenant_accesses_phoenix.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_1657.html + +.. _mrs_01_1657: + +Insufficient Rights When a Tenant Accesses Phoenix +================================================== + +Question +-------- + +When a tenant accesses Phoenix, a message is displayed indicating that the tenant has insufficient rights. + +Answer +------ + +You need to associate the HBase service and Yarn queues when creating a tenant. + +The tenant must be granted additional rights to perform operations on Phoenix, that is, the RWX permission on the Phoenix system table. + +Example: + +Tenant **hbase** has been created. Log in to the HBase Shell as user **admin** and run the **scan 'hbase:acl'** command to query the role of the tenant. The role is **hbase_1450761169920** (in the format of tenant name_timestamp). + +Run the following commands to grant rights to the tenant (if the Phoenix system table has not been generated, log in to the Phoenix client as user **admin** first and then grant rights on the HBase Shell): + +**grant '@hbase_1450761169920','RWX','SYSTEM.CATALOG'** + +**grant '@hbase_1450761169920','RWX','SYSTEM.FUNCTION'** + +**grant '@hbase_1450761169920','RWX','SYSTEM.SEQUENCE'** + +**grant '@hbase_1450761169920','RWX','SYSTEM.STATS'** + +Create user **phoenix** and bind it with tenant **hbase**, so that tenant **hbase** can access the Phoenix client as user **phoenix**. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/operation_failures_occur_in_stopping_bulkload_on_the_client.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/operation_failures_occur_in_stopping_bulkload_on_the_client.rst new file mode 100644 index 0000000..2ed1e64 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/operation_failures_occur_in_stopping_bulkload_on_the_client.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_1640.html + +.. _mrs_01_1640: + +Operation Failures Occur in Stopping BulkLoad On the Client +=========================================================== + +Question +-------- + +Why submitted operations fail by stopping BulkLoad on the client during BulkLoad data importing? + +Answer +------ + +When BulkLoad is enabled on the client, a partitioner file is generated and used to demarcate the range of Map task data inputting. The file is automatically deleted when BulkLoad exists on the client. In general, if all map tasks are enabled and running, the termination of BulkLoad on the client does not cause the failure of submitted operations. However, due to the retry and speculative execution mechanism of Map tasks, a Map task is performed again if failures of the Reduce task to download the data of the completed Map task exceed the limit. In this case, if BulkLoad already exists on the client, the retry Map task fails and the operation failure occurs because the partitioner file is missing. Therefore, it is recommended not to stop BulkLoad on the client during BulkLoad data importing. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/what_can_i_do_when_hbase_fails_to_recover_a_task_and_a_message_is_displayed_stating_rollback_recovery_failed.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/what_can_i_do_when_hbase_fails_to_recover_a_task_and_a_message_is_displayed_stating_rollback_recovery_failed.rst new file mode 100644 index 0000000..3b3823d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/what_can_i_do_when_hbase_fails_to_recover_a_task_and_a_message_is_displayed_stating_rollback_recovery_failed.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_1659.html + +.. _mrs_01_1659: + +What Can I Do When HBase Fails to Recover a Task and a Message Is Displayed Stating "Rollback recovery failed"? +=============================================================================================================== + +Question +-------- + +The system automatically rolls back data after an HBase recovery task fails. If "Rollback recovery failed" is displayed, the rollback fails. After the rollback fails, data stops being processed and the junk data may be generated. How can I resolve this problem? + +Answer +------ + +You need to manually clear the junk data before performing the backup or recovery task next time. + +#. Install the cluster client in **/opt/client**. + +#. Run **source /opt/client/bigdata_env** as the client installation user to configure the environment variable. + +#. Run **kinit admin**. + +#. Run **zkCli.sh -server** *business IP address of ZooKeeper*\ **:2181** to connect to the ZooKeeper. + +#. Run **deleteall /recovering** to delete the junk data. Run **quit** to disconnect ZooKeeper. + + .. note:: + + Running this command will cause data loss. Exercise caution. + +#. Run **hdfs dfs -rm -f -r /user/hbase/backup** to delete temporary data. + +#. On Manager, view related snapshot name information from recovery task records. + + .. code-block:: + + Snapshot [ snapshot name ] is created successfully before recovery. + +#. Switch to the client, run **hbase shell**, and then **delete_all_snapshot** '*snapshot name*\ **.*'** to delete the temporary snapshot. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/what_should_i_do_if_i_fail_to_create_tables_due_to_the_failed_open_state_of_regions.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/what_should_i_do_if_i_fail_to_create_tables_due_to_the_failed_open_state_of_regions.rst new file mode 100644 index 0000000..6707654 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/what_should_i_do_if_i_fail_to_create_tables_due_to_the_failed_open_state_of_regions.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_1651.html + +.. _mrs_01_1651: + +What Should I Do If I Fail to Create Tables Due to the FAILED_OPEN State of Regions? +==================================================================================== + +Question +-------- + +What should I do if I fail to create tables due to the FAILED_OPEN state of Regions? + +Answer +------ + +If a network, HDFS, or Active HMaster fault occurs during the creation of tables, some Regions may fail to go online and therefore enter the FAILED_OPEN state. In this case, tables fail to be created. + +The tables that fail to be created due to the preceding mentioned issue cannot be repaired. To solve this problem, perform the following operations to delete and re-create the tables: + +#. Run the following command on the cluster client to repair the state of the tables: + + **hbase hbck -fixTableStates** + +#. Enter the HBase shell and run the following commands to delete the tables that fail to be created: + + **truncate** *''* + + **disable** *''* + + **drop** *''* + +#. Create the tables using the recreation command. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/when_does_the_regionservers_listed_under_dead_region_servers_on_hmaster_webui_gets_cleared.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/when_does_the_regionservers_listed_under_dead_region_servers_on_hmaster_webui_gets_cleared.rst new file mode 100644 index 0000000..9f5b650 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/when_does_the_regionservers_listed_under_dead_region_servers_on_hmaster_webui_gets_cleared.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_1649.html + +.. _mrs_01_1649: + +When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared? +============================================================================================= + +Question +-------- + +When does the RegionServers listed under "Dead Region Servers" on HMaster WebUI gets cleared? + +Answer +------ + +When an online RegionServer goes down abruptly, it is displayed under "Dead Region Servers" in the HMaster WebUI. When dead RegionServer restarts and reports back to HMaster successfully, the "Dead Region Servers" in the HMaster WebUI gets cleared. + +The "Dead Region Servers" is also gets cleared, when the HMaster failover operation is performed successfully. + +In cases when an Active HMaster hosting some regions is abruptly killed, Backup HMaster will become the new Active HMater and displays previous Active HMaster as dead RegionServer. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_a_message_is_displayed_indicating_that_the_permission_is_insufficient_when_ctbase_connects_to_the_ranger_plug-ins.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_a_message_is_displayed_indicating_that_the_permission_is_insufficient_when_ctbase_connects_to_the_ranger_plug-ins.rst new file mode 100644 index 0000000..0920711 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_a_message_is_displayed_indicating_that_the_permission_is_insufficient_when_ctbase_connects_to_the_ranger_plug-ins.rst @@ -0,0 +1,39 @@ +:original_name: mrs_01_2212.html + +.. _mrs_01_2212: + +Why a Message Is Displayed Indicating that the Permission is Insufficient When CTBase Connects to the Ranger Plug-ins? +====================================================================================================================== + +Question +-------- + +When CTBase accesses the HBase service with the Ranger plug-ins enabled and you are creating a cluster table, a message is displayed indicating that the permission is insufficient. + +.. code-block:: + + ERROR: Create ClusterTable failed. Error: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user 'ctbase2@HADOOP.COM' (action=create) + at org.apache.ranger.authorization.hbase.AuthorizationSession.publishResults(AuthorizationSession.java:278) + at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.authorizeAccess(RangerAuthorizationCoprocessor.java:654) + at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.requirePermission(RangerAuthorizationCoprocessor.java:772) + at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.preCreateTable(RangerAuthorizationCoprocessor.java:943) + at org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor.preCreateTable(RangerAuthorizationCoprocessor.java:428) + at org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:351) + at org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:348) + at org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:581) + at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:655) + at org.apache.hadoop.hbase.master.MasterCoprocessorHost.preCreateTable(MasterCoprocessorHost.java:348) + at org.apache.hadoop.hbase.master.HMaster$5.run(HMaster.java:2192) + at org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134) + at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:2189) + at org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:711) + at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) + at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:458) + at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) + at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) + at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) + +Answer +------ + +CTBase users can configure permission policies on the Ranger page and grant the READ, WRITE, CREATE, ADMIN, and EXECUTE permissions to the CTBase metadata table **\_ctmeta\_**, cluster table, and index table. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_are_different_query_results_returned_after_i_use_same_query_criteria_to_query_data_successfully_imported_by_hbase_bulkload.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_are_different_query_results_returned_after_i_use_same_query_criteria_to_query_data_successfully_imported_by_hbase_bulkload.rst new file mode 100644 index 0000000..c95af15 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_are_different_query_results_returned_after_i_use_same_query_criteria_to_query_data_successfully_imported_by_hbase_bulkload.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_1650.html + +.. _mrs_01_1650: + +Why Are Different Query Results Returned After I Use Same Query Criteria to Query Data Successfully Imported by HBase bulkload? +=============================================================================================================================== + +Question +-------- + +If the data to be imported by HBase bulkload has identical rowkeys, the data import is successful but identical query criteria produce different query results. + +Answer +------ + +Data with an identical rowkey is loaded into HBase in the order in which data is read. The data with the latest timestamp is considered to be the latest data. By default, data is not queried by timestamp. Therefore, if you query for data with an identical rowkey, only the latest data is returned. + +While data is being loaded by bulkload, the memory processes the data into HFiles quickly, leading to the possibility that data with an identical rowkey has a same timestamp. In this case, identical query criteria may produce different query results. + +To avoid this problem, ensure that the same data file does not contain identical rowkeys while you are creating tables or loading data. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_a_client_keep_failing_to_connect_to_a_server_for_a_long_time.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_a_client_keep_failing_to_connect_to_a_server_for_a_long_time.rst new file mode 100644 index 0000000..8af71d7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_a_client_keep_failing_to_connect_to_a_server_for_a_long_time.rst @@ -0,0 +1,47 @@ +:original_name: mrs_01_1639.html + +.. _mrs_01_1639: + +Why Does a Client Keep Failing to Connect to a Server for a Long Time? +====================================================================== + +Question +-------- + +A HBase server is faulty and cannot provide services. In this case, when a table operation is performed on the HBase client, why is the operation suspended and no response is received for a long time? + +Answer +------ + +**Problem Analysis** + +When the HBase server malfunctions, the table operation request from the HBase client is tried for several times and times out. The default timeout value is **Integer.MAX_VALUE (2147483647 ms)**. The table operation request is retired constantly during such a long period of time and is suspended at last. + +**Solution** + +The HBase client provides two configuration items to configure the retry and timeout of the client. :ref:`Table 1 ` describes them. + +Set the following parameters in the **Client installation path/HBase/hbase/conf/hbase-site.xml** configuration file: + +.. _mrs_01_1639__en-us_topic_0000001173631290_te9ce661d0c4a4745b801616b66b97321: + +.. table:: **Table 1** Configuration parameters of retry and timeout + + +--------------------------------+-----------------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Default Value | + +================================+=====================================================================================================+===============+ + | hbase.client.operation.timeout | Client operation timeout period You need to manually add the information to the configuration file. | 2147483647 ms | + +--------------------------------+-----------------------------------------------------------------------------------------------------+---------------+ + | hbase.client.retries.number | Maximum retry times supported by all retryable operations. | 35 | + +--------------------------------+-----------------------------------------------------------------------------------------------------+---------------+ + +:ref:`Figure 1 ` describes the working principles of retry and timeout. + +.. _mrs_01_1639__en-us_topic_0000001173631290_fc7b1b6a1826d4b98bb68d1fd842512cb: + +.. figure:: /_static/images/en-us_image_0000001349139753.jpg + :alt: **Figure 1** Process for HBase client operation retry timeout + + **Figure 1** Process for HBase client operation retry timeout + +The process indicates that a suspension occurs if the preceding parameters are not configured based on site requirements. It is recommended that a proper timeout period be set based on scenarios. If the operation takes a long time, set a long timeout period. If the operation takes a shot time, set a short timeout period. The number of retries can be set to **(hbase.client.retries.number)*60*1000(ms)**. The timeout period can be slightly greater than **hbase.client.operation.timeout**. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_hbase_become_faulty_when_i_set_a_quota_for_the_directory_used_by_hbase_in_hdfs.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_hbase_become_faulty_when_i_set_a_quota_for_the_directory_used_by_hbase_in_hdfs.rst new file mode 100644 index 0000000..d91e3f0 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_hbase_become_faulty_when_i_set_a_quota_for_the_directory_used_by_hbase_in_hdfs.rst @@ -0,0 +1,57 @@ +:original_name: mrs_01_1653.html + +.. _mrs_01_1653: + +Why Does HBase Become Faulty When I Set a Quota for the Directory Used by HBase in HDFS? +======================================================================================== + +Question +-------- + +Why does HBase become faulty when I set quota for the directory used by HBase in HDFS? + +Answer +------ + +The flush operation of a table is to write memstore data to HDFS. + +If the HDFS directory does not have sufficient disk space quota, the flush operation will fail and the region server will stop. + +.. code-block:: + + Caused by: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /hbase/data// is exceeded: quota = 1024 B = 1 KB but diskspace consumed = 402655638 B = 384.00 MB + ?at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:211) + ?at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:239) + ?at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:882) + ?at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:711) + ?at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:670) + ?at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:495) + +In the preceding exception, the disk space quota of the **/hbase/data//** table is 1 KB, but the memstore data is 384.00 MB. Therefore, the flush operation fails and the region server stops. + +When the region server is terminated, HMaster replays the WAL file of the terminated region server to restore data. The disk space quota is limited. As a result, the replay operation of the WAL file fails, and the HMaster process exits unexpectedly. + +.. code-block:: + + 2016-07-28 19:11:40,352 | FATAL | MASTER_SERVER_OPERATIONS-10-91-9-131:16000-0 | Caught throwable while processing event M_SERVER_SHUTDOWN | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:2474) + java.io.IOException: failed log splitting for 10-91-9-131,16020,1469689987884, will retry + ?at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:365) + ?at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:220) + ?at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129) + ?at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) + ?at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) + ?at java.lang.Thread.run(Thread.java:745) + Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://hacluster/hbase/WALs/,,-splitting] Task = installed = 6 done = 3 error = 3 + ?at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290) + ?at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:402) + ?at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:375) + +Therefore, you cannot set the quota value for the HBase directory in HDFS. If the exception occurs, perform the following operations: + +#. Run the **kinit** *Username* command on the client to enable the HBase user to obtain security authentication. + +#. Run the **hdfs dfs -count -q** */hbase/data//* command to check the allocated disk space quota. + +#. Run the following command to cancel the quota limit and restore HBase: + + **hdfs dfsadmin -clrSpaceQuota** */hbase/data//* diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_hmaster_exits_due_to_timeout_when_waiting_for_the_namespace_table_to_go_online.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_hmaster_exits_due_to_timeout_when_waiting_for_the_namespace_table_to_go_online.rst new file mode 100644 index 0000000..4695b78 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_hmaster_exits_due_to_timeout_when_waiting_for_the_namespace_table_to_go_online.rst @@ -0,0 +1,51 @@ +:original_name: mrs_01_1645.html + +.. _mrs_01_1645: + +Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online? +======================================================================================== + +Question +-------- + +Why does HMaster exit due to timeout when waiting for the namespace table to go online? + +Answer +------ + +During the HMaster active/standby switchover or startup, HMaster performs WAL splitting and region recovery for the RegionServer that failed or was stopped previously. + +Multiple threads are running in the background to monitor the HMaster startup process. + +- TableNamespaceManager + + This is a help class, which is used to manage the allocation of namespace tables and monitoring table regions during HMaster active/standby switchover or startup. If the namespace table is not online within the specified time (**hbase.master.namespace.init.timeout**, which is 3,600,000 ms by default), the thread terminates HMaster abnormally. + +- InitializationMonitor + + This is an initialization thread monitoring class of the primary HMaster, which is used to monitor the initialization of the primary HMaster. If a thread fails to be initialized within the specified time (**hbase.master.initializationmonitor.timeout**, which is 3,600,000 ms by default), the thread terminates HMaster abnormally. If **hbase.master.initializationmonitor.haltontimeout** is started, the default value is **false**. + +During the HMaster active/standby switchover or startup, if the **WAL hlog** file exists, the WAL splitting task is initialized. If the WAL hlog splitting task is complete, it initializes the table region allocation task. + +HMaster uses ZooKeeper to coordinate log splitting tasks and valid RegionServers and track task development. If the primary HMaster exits during the log splitting task, the new primary HMaster attempts to resend the unfinished task, and RegionServer starts the log splitting task from the beginning. + +The initialization of the HMaster is delayed due to the following reasons: + +- Network faults occur intermittently. +- Disks run into bottlenecks. +- The log splitting task is overloaded, and RegionServer runs slowly. +- RegionServer (region opening) responds slowly. + +In the preceding scenarios, you are advised to add the following configuration parameters to enable HMaster to complete the restoration task earlier. Otherwise, the Master will exit, causing a longer delay of the entire restoration process. + +- Increase the online waiting timeout period of the namespace table to ensure that the Master has enough time to coordinate the splitting tasks of the RegionServer worker and avoid repeated tasks. + + **hbase.master.namespace.init.timeout** (default value: 3,600,000 ms) + +- Increase the number of concurrent splitting tasks through RegionServer worker to ensure that RegionServer worker can process splitting tasks in parallel (RegionServers need more cores). Add the following parameters to *Client installation path* **/HBase/hbase/conf/hbase-site.xml**: + + **hbase.regionserver.wal.max.splitters** (default value: 2) + +- If all restoration processes require time, increase the timeout period for initializing the monitoring thread. + + **hbase.master.initializationmonitor.timeout** (default value: 3,600,000 ms) diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_regionserver_fail_to_be_started_when_gc_parameters_xms_and_xmx_of_hbase_regionserver_are_set_to_31_gb.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_regionserver_fail_to_be_started_when_gc_parameters_xms_and_xmx_of_hbase_regionserver_are_set_to_31_gb.rst new file mode 100644 index 0000000..fc819e8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_regionserver_fail_to_be_started_when_gc_parameters_xms_and_xmx_of_hbase_regionserver_are_set_to_31_gb.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_1661.html + +.. _mrs_01_1661: + +Why Does RegionServer Fail to Be Started When GC Parameters Xms and Xmx of HBase RegionServer Are Set to 31 GB? +=============================================================================================================== + +Question +-------- + +Check the **hbase-omm-*.out** log of the node where RegionServer fails to be started. It is found that the log contains **An error report file with more information is saved as: /tmp/hs_err_pid*.log**. Check the **/tmp/hs_err_pid*.log** file. It is found that the log contains **#Internal Error (vtableStubs_aarch64.cpp:213), pid=9456, tid=0x0000ffff97fdd200 and #guarantee(_\_ pc() <= s->code_end()) failed: overflowed buffer**, indicating that the problem is caused by JDK. How do I solve this problem? + +Answer +------ + +To rectify the fault, perform the following steps: + +#. Run the **su - omm** command on a node where RegionServer fails to be started to switch to user **omm**. + +#. Run the **java -XX:+PrintFlagsFinal -version \|grep HeapBase** command as user **omm**. Information similar to the following is displayed: + + .. code-block:: + + uintx HeapBaseMinAddress = 2147483648 {pd product} + +#. Change the values of **-Xms** and **-Xmx** in **GC_OPTS** to values that are not between **32G-HeapBaseMinAddress** and **32G**, excluding the values of **32G** and **32G-HeapBaseMinAddress**. + +#. Log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **HBase** > **Instance**, select the failed instance, and choose **More** > **Restart Instance** to restart the failed instance. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_sockettimeoutexception_occur_when_a_client_queries_hbase.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_sockettimeoutexception_occur_when_a_client_queries_hbase.rst new file mode 100644 index 0000000..9b2e5db --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_sockettimeoutexception_occur_when_a_client_queries_hbase.rst @@ -0,0 +1,56 @@ +:original_name: mrs_01_1646.html + +.. _mrs_01_1646: + +Why Does SocketTimeoutException Occur When a Client Queries HBase? +================================================================== + +Question +-------- + +Why does the following exception occur on the client when I use the HBase client to operate table data? + +.. code-block:: + + 2015-12-15 02:41:14,054 | WARN | [task-result-getter-2] | Lost task 2.0 in stage 58.0 (TID 3288, linux-175): + org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: + Tue Dec 15 02:41:14 CST 2015, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60303: + row 'xxxxxx' on table 'xxxxxx' at region=xxxxxx,\x05\x1E\x80\x00\x00\x00\x80\x00\x00\x00\x00\x00\x00\x00\x80\x00\x00\x00\x00\x00\x00\x000\x00\x80\x00\x00\x00\x80\x00\x00\x00\x80\x00\x00, + 1449912620868.6a6b7d0c272803d8186930a3bfdb10a9., hostname=xxxxxx,16020,1449941841479, seqNum=5 + at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:275) + at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:223) + at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61) + at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) + at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:323) + +At the same time, the following log is displayed on RegionServer: + +.. code-block:: + + 2015-12-15 02:45:44,551 | WARN | PriorityRpcServer.handler=7,queue=1,port=16020 | (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest) + ","starttimems":1450118730780,"responsesize":416,"method":"Scan","processingtimems":13770,"client":"10.91.8.175:41182","queuetimems":0,"class":"HRegionServer"} | + org.apache.hadoop.hbase.ipc.RpcServer.logResponse(RpcServer.java:2221) + 2015-12-15 02:45:57,722 | WARN | PriorityRpcServer.handler=3,queue=1,port=16020 | (responseTooSlow): + {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1450118746297,"responsesize":416, + "method":"Scan","processingtimems":11425,"client":"10.91.8.175:41182","queuetimems":1746,"class":"HRegionServer"} | org.apache.hadoop.hbase.ipc.RpcServer.logResponse(RpcServer.java:2221) + 2015-12-15 02:47:21,668 | INFO | LruBlockCacheStatsExecutor | totalSize=7.54 GB, freeSize=369.52 MB, max=7.90 GB, blockCount=406107, + accesses=35400006, hits=16803205, hitRatio=47.47%, , cachingAccesses=31864266, cachingHits=14806045, cachingHitsRatio=46.47%, + evictions=17654, evicted=16642283, evictedPerRun=942.69189453125 | org.apache.hadoop.hbase.io.hfile.LruBlockCache.logStats(LruBlockCache.java:858) + 2015-12-15 02:52:21,668 | INFO | LruBlockCacheStatsExecutor | totalSize=7.51 GB, freeSize=395.34 MB, max=7.90 GB, blockCount=403080, + accesses=35685793, hits=16933684, hitRatio=47.45%, , cachingAccesses=32150053, cachingHits=14936524, cachingHitsRatio=46.46%, + evictions=17684, evicted=16800617, evictedPerRun=950.046142578125 | org.apache.hadoop.hbase.io.hfile.LruBlockCache.logStats(LruBlockCache.java:858) + +Answer +------ + +The memory allocated to RegionServer is too small and the number of Regions is too large. As a result, the memory is insufficient during the running, and the server responds slowly to the client. Modify the following memory allocation parameters in the **hbase-site.xml** configuration file of RegionServer: + +.. table:: **Table 1** RegionServer memory allocation parameters + + +------------------------+-----------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Default Value | + +========================+=====================================================================================================+=========================================================================================================================+ + | GC_OPTS | Initial memory and maximum memory allocated to RegionServer in startup parameters. | -Xms8G -Xmx8G | + +------------------------+-----------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+ + | hfile.block.cache.size | Percentage of the maximum heap (-Xmx setting) allocated to the block cache of HFiles or StoreFiles. | When **offheap** is disabled, the default value is **0.25**. When **offheap** is enabled, the default value is **0.1**. | + +------------------------+-----------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_the_hbase_bulkload_task_one_table_has_26_tb_data_consisting_of_210,000_map_tasks_and_10,000_reduce_tasks_fail.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_the_hbase_bulkload_task_one_table_has_26_tb_data_consisting_of_210,000_map_tasks_and_10,000_reduce_tasks_fail.rst new file mode 100644 index 0000000..4c38bfe --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_the_hbase_bulkload_task_one_table_has_26_tb_data_consisting_of_210,000_map_tasks_and_10,000_reduce_tasks_fail.rst @@ -0,0 +1,25 @@ +:original_name: mrs_01_1643.html + +.. _mrs_01_1643: + +Why Does the HBase BulkLoad Task (One Table Has 26 TB Data) Consisting of 210,000 Map Tasks and 10,000 Reduce Tasks Fail? +========================================================================================================================= + +Question +-------- + +The HBase bulkLoad task (a single table contains 26 TB data) has 210,000 maps and 10,000 reduce tasks, and the task fails. + +Answer +------ + +**ZooKeeper I/O bottleneck observation methods:** + +#. On the monitoring page of Manager, check whether the number of ZooKeeper requests on a single node exceeds the upper limit. +#. View ZooKeeper and HBase logs to check whether a large number of I/O Exception Timeout or SocketTimeout Exception exceptions occur. + +**Optimization suggestions:** + +#. Change the number of ZooKeeper instances to 5 or more. You are advised to set **peerType** to **observer** to increase the number of observers. +#. Control the number of concurrent maps of a single task or reduce the memory for running tasks on each node to lighten the node load. +#. Upgrade ZooKeeper data disks, such as SSDs. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_the_loadincrementalhfiles_tool_fail_to_be_executed_and_permission_denied_is_displayed_when_nodes_in_a_cluster_are_used_to_import_data_in_batches.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_the_loadincrementalhfiles_tool_fail_to_be_executed_and_permission_denied_is_displayed_when_nodes_in_a_cluster_are_used_to_import_data_in_batches.rst new file mode 100644 index 0000000..5be5252 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_does_the_loadincrementalhfiles_tool_fail_to_be_executed_and_permission_denied_is_displayed_when_nodes_in_a_cluster_are_used_to_import_data_in_batches.rst @@ -0,0 +1,72 @@ +:original_name: mrs_01_0625.html + +.. _mrs_01_0625: + +Why Does the LoadIncrementalHFiles Tool Fail to Be Executed and "Permission denied" Is Displayed When Nodes in a Cluster Are Used to Import Data in Batches? +============================================================================================================================================================ + +Question +-------- + +Why does the LoadIncrementalHFiles tool fail to be executed and "Permission denied" is displayed when a Linux user is manually created in a normal cluster and DataNode in the cluster is used to import data in batches? + +.. code-block:: + + 2020-09-20 14:53:53,808 WARN [main] shortcircuit.DomainSocketFactory: error creating DomainSocket + java.net.ConnectException: connect(2) error: Permission denied when trying to connect to '/var/run/FusionInsight-HDFS/dn_socket' + at org.apache.hadoop.net.unix.DomainSocket.connect0(Native Method) + at org.apache.hadoop.net.unix.DomainSocket.connect(DomainSocket.java:256) + at org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.createSocket(DomainSocketFactory.java:168) + at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextDomainPeer(BlockReaderFactory.java:804) + at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:526) + at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:785) + at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:722) + at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:483) + at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360) + at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:663) + at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:594) + at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776) + at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:845) + at java.io.DataInputStream.readFully(DataInputStream.java:195) + at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:401) + at org.apache.hadoop.hbase.io.hfile.HFile.isHFileFormat(HFile.java:651) + at org.apache.hadoop.hbase.io.hfile.HFile.isHFileFormat(HFile.java:634) + at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.visitBulkHFiles(LoadIncrementalHFiles.java:1090) + at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.discoverLoadQueue(LoadIncrementalHFiles.java:1006) + at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.prepareHFileQueue(LoadIncrementalHFiles.java:257) + at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:364) + at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1263) + at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1276) + at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:1311) + at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) + at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:1333) + +Answer +------ + +If the client that the LoadIncrementalHFiles tool depends on is installed in the cluster and is on the same node as DataNode, HDFS creates short-circuit read during the execution of the tool to improve performance. The short-circuit read depends on the **/var/run/FusionInsight-HDFS** directory (**dfs.domain.socket.path**). The default permission on this directory is **750**. This user does not have the permission to operate the directory. + +To solve the preceding problem, perform the following operations: + +Method 1: Create a user (recommended). + +#. Create a user on Manager. By default, the user group contains the **ficommon** group. + + .. code-block:: console + + [root@xxx-xxx-xxx-xxx ~]# id test + uid=20038(test) gid=9998(ficommon) groups=9998(ficommon) + +#. Import data again. + +Method 2: Change the owner group of the current user. + +#. Add the user to the **ficommon** group. + + .. code-block:: console + + [root@xxx-xxx-xxx-xxx ~]# usermod -a -G ficommon test + [root@xxx-xxx-xxx-xxx ~]# id test + uid=2102(test) gid=2102(test) groups=2102(test),9998(ficommon) + +#. Import data again. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_hmaster_times_out_while_waiting_for_namespace_table_to_be_assigned_after_rebuilding_meta_using_offlinemetarepair_tool_and_startups_failed.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_hmaster_times_out_while_waiting_for_namespace_table_to_be_assigned_after_rebuilding_meta_using_offlinemetarepair_tool_and_startups_failed.rst new file mode 100644 index 0000000..6905c85 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_hmaster_times_out_while_waiting_for_namespace_table_to_be_assigned_after_rebuilding_meta_using_offlinemetarepair_tool_and_startups_failed.rst @@ -0,0 +1,35 @@ +:original_name: mrs_01_1654.html + +.. _mrs_01_1654: + +Why HMaster Times Out While Waiting for Namespace Table to be Assigned After Rebuilding Meta Using OfflineMetaRepair Tool and Startups Failed +============================================================================================================================================= + +Question +-------- + +Why HMaster times out while waiting for namespace table to be assigned after rebuilding meta using OfflineMetaRepair tool and startups failed? + +HMaster abort with following FATAL message, + +.. code-block:: + + 2017-06-15 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. + java.io.IOException: Timedout 120000ms waiting for namespace table to be assigned + at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98) + at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054) + at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848) + at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199) + at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871) + at java.lang.Thread.run(Thread.java:745) + +Answer +------ + +When meta is rebuilt by OfflineMetaRepair tool then HMaster wait for all region server's WAL split during start up to avoid the data inconsistency problem. HMaster trigger user regions assignment once WAL split completes. So when the cluster is in the unusual scenario, there are chances WAL splitting may take long time which depends on multiple factors like too many WALs, slow I/O, region servers are not stable etc. + +HMaster should be able to finish all region server WAL splitting successfully. Perform the following steps. + +#. Make sure cluster is stable, no other problem exist. If any problem occurs, please correct them first. +#. Configure a large value to **hbase.master.initializationmonitor.timeout** parameters, default value is **3600000** milliseconds. +#. Restart HBase service. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_is_the_error_message_import_argparse_displayed_when_the_phoenix_sqlline_script_is_used.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_is_the_error_message_import_argparse_displayed_when_the_phoenix_sqlline_script_is_used.rst new file mode 100644 index 0000000..ea512e0 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_is_the_error_message_import_argparse_displayed_when_the_phoenix_sqlline_script_is_used.rst @@ -0,0 +1,17 @@ +:original_name: mrs_01_2210.html + +.. _mrs_01_2210: + +Why Is the Error Message "import argparse" Displayed When the Phoenix sqlline Script Is Used? +============================================================================================= + +Question +-------- + +When the sqlline script is used on the client, the error message "import argparse" is displayed. + +Answer +------ + +#. Log in to the node where the HBase client is installed as user **root**. Perform security authentication using the **hbase** user. +#. Go to the directory where the sqlline script of the HBase client is stored and run the **python3 sqlline.py** command. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_java.lang.unsatisfiedlinkerror_permission_denied_exception_thrown_while_starting_hbase_shell.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_java.lang.unsatisfiedlinkerror_permission_denied_exception_thrown_while_starting_hbase_shell.rst new file mode 100644 index 0000000..5f3c304 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_java.lang.unsatisfiedlinkerror_permission_denied_exception_thrown_while_starting_hbase_shell.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_1648.html + +.. _mrs_01_1648: + +Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell? +==================================================================================================== + +Question +-------- + +Why "java.lang.UnsatisfiedLinkError: Permission denied" exception thrown while starting HBase shell? + +Answer +------ + +During HBase shell execution JRuby create temporary files under **java.io.tmpdir** path and default value of **java.io.tmpdir** is **/tmp**. If NOEXEC permission is set to /tmp directory then HBase shell start will fail with "java.lang.UnsatisfiedLinkError: Permission denied" exception. + +So "java.io.tmpdir" must be set to a different path in HBASE_OPTS/CLIENT_GC_OPTS if NOEXEC is set to /tmp directory. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_may_a_table_creation_exception_occur_when_hbase_deletes_or_creates_the_same_table_consecutively.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_may_a_table_creation_exception_occur_when_hbase_deletes_or_creates_the_same_table_consecutively.rst new file mode 100644 index 0000000..d286ba3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_may_a_table_creation_exception_occur_when_hbase_deletes_or_creates_the_same_table_consecutively.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_1641.html + +.. _mrs_01_1641: + +Why May a Table Creation Exception Occur When HBase Deletes or Creates the Same Table Consecutively? +==================================================================================================== + +Question +-------- + +When HBase consecutively deletes and creates the same table, why may a table creation exception occur? + +Answer +------ + +Execution process: Disable Table > Drop Table > Create Table > Disable Table > Drop Table > And more + +#. When a table is disabled, HMaster sends an RPC request to RegionServer, and RegionServer brings the region offline. When the time required for closing a region on RegionServer exceeds the timeout period for HBase HMaster to wait for the region to enter the RIT state, HMaster considers that the region is offline by default. Actually, the region may be in the flush memstore phase. +#. After an RPC request is sent to close a region, HMaster checks whether all regions in the table are offline. If the closure times out, HMaster considers that the regions are offline and returns a message indicating that the regions are successfully closed. +#. After the closure is successful, the data directory corresponding to the HBase table is deleted. +#. After the table is deleted, the data directory is recreated by the region that is still in the flush memstore phase. +#. When the table is created again, the **temp** directory is copied to the HBase data directory. However, the HBase data directory is not empty. As a result, when the HDFS rename API is called, the data directory changes to the last layer of the **temp** directory and is appended to the HBase data directory, for example, **$rootDir/data/$nameSpace/$tableName/$tableName**. In this case, the table fails to be created. + +**Troubleshooting Method** + +When this problem occurs, check whether the HBase data directory corresponding to the table exists. If it exists, rename the directory. + +The HBase data directory consists of **$rootDir/data/$nameSpace/$tableName**, for example, **hdfs://hacluster/hbase/data/default/TestTable**. **$rootDir** is the HBase root directory, which can be obtained by configuring **hbase.rootdir.perms** in **hbase-site.xml**. The **data** directory is a fixed directory of HBase. **$nameSpace** indicates the nameSpace name. **$tableName** indicates the table name. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_messages_containing_filenotfoundexception_and_no_lease_are_frequently_displayed_in_the_hmaster_logs_during_the_wal_splitting_process.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_messages_containing_filenotfoundexception_and_no_lease_are_frequently_displayed_in_the_hmaster_logs_during_the_wal_splitting_process.rst new file mode 100644 index 0000000..4e85af3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_messages_containing_filenotfoundexception_and_no_lease_are_frequently_displayed_in_the_hmaster_logs_during_the_wal_splitting_process.rst @@ -0,0 +1,59 @@ +:original_name: mrs_01_1655.html + +.. _mrs_01_1655: + +Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process? +========================================================================================================================================= + +Question +-------- + +Why messages containing FileNotFoundException and no lease are frequently displayed in the HMaster logs during the WAL splitting process? + +.. code-block:: + + 2017-06-10 09:50:27,586 | ERROR | split-log-closeStream-2 | Couldn't close log at hdfs://hacluster/hbase/data/default/largeT1/2b48346d087275fe751fc049334fda93/recovered.edits/0000000000000000000.temp | org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink$2.call(WALSplitter.java:1330) + java.io.FileNotFoundException: No lease on /hbase/data/default/largeT1/2b48346d087275fe751fc049334fda93/recovered.edits/0000000000000000000.temp (inode 1092653): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_1202985678_1, pendingcreates: 1936] + ?at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3432) + ?at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3223) + ?at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3057) + ?at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3011) + ?at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:842) + ?at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:526) + ?at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) + ?at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) + ?at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973) + ?at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2260) + ?at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2256) + ?at java.security.AccessController.doPrivileged(Native Method) + ?at javax.security.auth.Subject.doAs(Subject.java:422) + ?at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1769) + ?at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2254) + + ?at sun.reflect.GeneratedConstructorAccessor40.newInstance(Unknown Source) + ?at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) + ?at java.lang.reflect.Constructor.newInstance(Constructor.java:423) + ?at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) + ?at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) + ?at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1842) + ?at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1639) + ?at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:665) + +Answer +------ + +During the WAL splitting process, the WAL splitting timeout period is specified by the **hbase.splitlog.manager.timeout** parameter. If the WAL splitting process fails to complete within the timeout period, the task is submitted again. Multiple WAL splitting tasks may be submitted during a specified period. If the **temp** file is deleted when one WAL splitting task completes, other tasks cannot find the file and the FileNotFoudException exception is reported. To avoid the problem, perform the following modifications: + +The default value of **hbase.splitlog.manager.timeout** is 600,000 ms. The cluster specification is that each RegionServer has 2,000 to 3,000 regions. When the cluster is normal (HBase is normal and HDFS does not have a large number of read and write operations), you are advised to adjust this parameter based on the cluster specifications. If the actual specifications (the actual average number of regions on each RegionServer) are greater than the default specifications (the default average number of regions on each RegionServer, that is, 2,000), the adjustment solution is (actual specifications/default specifications) x Default time. + +Set the **splitlog** parameter in the **hbase-site.xml** file on the server. :ref:`Table 1 ` describes the parameter. + +.. _mrs_01_1655__en-us_topic_0000001219149705_td061a2527dd94860b0b6d9989d7fd9ee: + +.. table:: **Table 1** Description of the **splitlog** parameter + + +--------------------------------+----------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Default Value | + +================================+==============================================================================================+===============+ + | hbase.splitlog.manager.timeout | Timeout period for receiving worker response by the distributed SplitLog management program. | 600000 | + +--------------------------------+----------------------------------------------------------------------------------------------+---------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_modified_and_deleted_data_can_still_be_queried_by_using_the_scan_command.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_modified_and_deleted_data_can_still_be_queried_by_using_the_scan_command.rst new file mode 100644 index 0000000..8d62aef --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_modified_and_deleted_data_can_still_be_queried_by_using_the_scan_command.rst @@ -0,0 +1,43 @@ +:original_name: mrs_01_1647.html + +.. _mrs_01_1647: + +Why Modified and Deleted Data Can Still Be Queried by Using the Scan Command? +============================================================================= + +Question +-------- + +Why modified and deleted data can still be queried by using the **scan** command? + +.. code-block:: + + scan '',{FILTER=>"SingleColumnValueFilter('','column',=,'binary:')"} + +Answer +------ + +Because of the scalability of HBase, all values specific to the versions in the queried column are all matched by default, even if the values have been modified or deleted. For a row where column matching has failed (that is, the column does not exist in the row), the HBase also queries the row. + +If you want to query only the new values and rows where column matching is successful, you can use the following statement: + +.. code-block:: + + scan '',{FILTER=>"SingleColumnValueFilter('','column',=,'binary:',true,true)"} + +This command can filter all rows where column query has failed. It queries only the latest values of the current data in the table; that is, it does not query the values before modification or the deleted values. + +.. note:: + + The related parameters of **SingleColumnValueFilter** are described as follows: + + SingleColumnValueFilter(final byte[] family, final byte[] qualifier, final CompareOp compareOp, ByteArrayComparable comparator, final boolean filterIfMissing, final boolean latestVersionOnly) + + Parameter description: + + - family: family of the column to be queried. + - qualifier: column to be queried. + - compareOp: comparison operation, such as = and >. + - comparator: target value to be queried. + - filterIfMissing: whether a row is filtered out if the queried column does not exist. The default value is false. + - latestVersionOnly: whether values of the latest version are queried. The default value is false. diff --git a/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_other_services_become_unstable_if_hbase_sets_up_a_large_number_of_connections_over_the_network_port.rst b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_other_services_become_unstable_if_hbase_sets_up_a_large_number_of_connections_over_the_network_port.rst new file mode 100644 index 0000000..3c6347e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/common_issues_about_hbase/why_other_services_become_unstable_if_hbase_sets_up_a_large_number_of_connections_over_the_network_port.rst @@ -0,0 +1,37 @@ +:original_name: mrs_01_1642.html + +.. _mrs_01_1642: + +Why Other Services Become Unstable If HBase Sets up A Large Number of Connections over the Network Port? +======================================================================================================== + +Question +-------- + +Why other services become unstable if HBase sets up a large number of connections over the network port? + +Answer +------ + +When the OS command **lsof** or **netstat** is run, it is found that many TCP connections are in the CLOSE_WAIT state and the owner of the connections is HBase RegionServer. This can cause exhaustion of network ports or limit exceeding of HDFS connections, resulting in instability of other services. The HBase CLOSE_WAIT phenomenon is the HBase mechanism. + +The reason why HBase CLOSE_WAIT occurs is as follows: HBase data is stored in the HDFS as HFile, which can be called StoreFiles. HBase functions as the client of the HDFS. When HBase creates a StoreFile or starts loading a StoreFile, it creates an HDFS connection. When the StoreFile is created or loaded successfully, the HDFS considers that the task is completed and transfers the connection close permission to HBase. However, HBase may choose not to close the connection to ensure real-time response; that is, HBase may maintain the connection so that it can quickly access the corresponding data file upon request. In this case, the connection is in the CLOSE_WAIT, which indicates that the connection needs to be closed by the client. + +When a StoreFile will be created: HBase executes the Flush operation. + +When Flush is executed: The data written by HBase is first stored in memstore. The Flush operation is performed only when the usage of memstore reaches the threshold or the **flush** command is run to write data into the HDFS. + +To resolve the issue, use either of the following methods: + +Because of the HBase connection mechanism, the number of StoreFiles must be restricted to reduce the occupation of HBase ports. This can be achieved by triggering HBase's the compaction action, that is, HBase file merging. + +Method 1: On HBase shell client, run **major_compact**. + +Method 2: Compile HBase client code to invoke the compact method of the HBaseAdmin class to trigger HBase's compaction action. + +If the HBase port occupation issue cannot be resolved through compact, it indicates that the HBase usage has reached the bottleneck. In such a case, you are advised to perform the following: + +- Check whether the initial number of Regions configured in the table is appropriate. +- Check whether useless data exists. + +If useless data exists, delete the data to reduce the number of storage files for the HBase. If the preceding conditions are not met, then you need to consider a capacity expansion. diff --git a/doc/component-operation-guide-lts/source/using_hbase/community_bulkload_tool.rst b/doc/component-operation-guide-lts/source/using_hbase/community_bulkload_tool.rst new file mode 100644 index 0000000..7fd7a3c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/community_bulkload_tool.rst @@ -0,0 +1,8 @@ +:original_name: mrs_01_1612.html + +.. _mrs_01_1612: + +Community BulkLoad Tool +======================= + +The Apache HBase official website provides the function of importing data in batches. For details, see the description of the **Import** and **ImportTsv** tools at http://hbase.apache.org/2.2/book.html#tools. diff --git a/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_data_compression_and_encoding.rst b/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_data_compression_and_encoding.rst new file mode 100644 index 0000000..6da6c3b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_data_compression_and_encoding.rst @@ -0,0 +1,113 @@ +:original_name: en-us_topic_0000001295898904.html + +.. _en-us_topic_0000001295898904: + +Configuring HBase Data Compression and Encoding +=============================================== + +Scenario +-------- + +HBase encodes data blocks in HFiles to reduce duplicate keys in KeyValues, reducing used space. Currently, the following data block encoding modes are supported: NONE, PREFIX, DIFF, FAST_DIFF, and ROW_INDEX_V1. NONE indicates that data blocks are not encoded. HBase also supports compression algorithms for HFile compression. The following algorithms are supported by default: NONE, GZ, SNAPPY, and ZSTD. NONE indicates that HFiles are not compressed. + +The two methods are used on the HBase column family. They can be used together or separately. + +Prerequisites +------------- + +- You have installed an HBase client. For example, the client is installed in **opt/client**. +- If authentication has been enabled for HBase, you must have the corresponding operation permissions. For example, you must have the creation (C) or administration (A) permission on the corresponding namespace or higher-level items to create a table, and the creation (C) or administration (A) permission on the created table or higher-level items to modify a table. For details about how to grant permissions, see :ref:`Creating HBase Roles `. + +Procedure +--------- + +**Setting data block encoding and compression algorithms during creation** + +- **Method 1: Using hbase shell** + + #. Log in to the node where the client is installed as the client installation user. + + #. Run the following command to go to the client directory: + + **cd /opt/client** + + #. Run the following command to configure environment variables: + + **source bigdata_env** + + #. If the Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step: + + **kinit** *Component service user* + + For example, **kinit hbaseuser**. + + #. Run the following HBase client command: + + **hbase shell** + + #. Create a table. + + **create '**\ *t1*\ **', {NAME => '**\ *f1*\ **', COMPRESSION => '**\ *SNAPPY*\ **', DATA_BLOCK_ENCODING => '**\ *FAST_DIFF*\ **'}** + + .. note:: + + - *t1*: indicates the table name. + - *f1*: indicates the column family name. + - *SNAPPY*: indicates the column family uses the SNAPPY compression algorithm. + - *FAST_DIFF*: indicates FAST_DIFF is used for encoding. + - The parameter in the braces specifies the column family. You can specify multiple column families using multiple braces and separate them by commas (,). For details about table creation statements, run the **help 'create'** statement in the HBase shell. + +- **Method 2: Using Java APIs** + + The following code snippet shows only how to set the encoding and compression modes of a column family when creating a table. + + .. code-block:: + + TableDescriptorBuilder htd = TableDescriptorBuilder.newBuilder(TableName.valueOf("t1"));// Create a descriptor for table t1. + ColumnFamilyDescriptorBuilder hcd = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1"));// Create a builder for column family f1. + hcd.setDataBlockEncoding(DataBlockEncoding.FAST_DIFF);// Set the encoding mode of column family f1 to FAST_DIFF. + hcd.setCompressionType(Compression.Algorithm.SNAPPY);// Set the compression algorithm of column family f1 to SNAPPY. + htd.setColumnFamily(hcd.build())// Add the column family f1 to the descriptor of table t1. + +**Setting or modifying the data block encoding mode and compression algorithm for an existing table** + +- **Method 1: Using hbase shell** + + #. Log in to the node where the client is installed as the client installation user. + + #. Run the following command to go to the client directory: + + **cd /opt/client** + + #. Run the following command to configure environment variables: + + **source bigdata_env** + + #. If the Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step: + + **kinit** *Component service user* + + For example, **kinit hbaseuser**. + + #. Run the following HBase client command: + + **hbase shell** + + #. Run the following command to modify the table: + + **alter '**\ *t1*\ **', {NAME => '**\ *f1*\ **', COMPRESSION => '**\ *SNAPPY*\ **', DATA_BLOCK_ENCODING => '**\ *FAST_DIFF*\ **'}** + +- **Method 2: Using Java APIs** + + The following code snippet shows only how to modify the encoding and compression modes of a column family in an existing table. For complete code for modifying a table and how to use the code to modify a table, see "HBase Development Guide". + + .. code-block:: + + TableDescriptor htd = admin.getDescriptor(TableName.valueOf("t1"));// Obtain the descriptor of table t1. + ColumnFamilyDescriptor originCF = htd.getColumnFamily(Bytes.toBytes("f1"));// Obtain the descriptor of column family f1. + builder.ColumnFamilyDescriptorBuilder hcd = ColumnFamilyDescriptorBuilder.newBuilder(originCF);// Create a builder based on the existing column family attributes. + hcd.setDataBlockEncoding(DataBlockEncoding.FAST_DIFF);// Change the encoding mode of the column family to FAST_DIFF. + hcd.setCompressionType(Compression.Algorithm.SNAPPY);// Change the compression algorithm of the column family to SNAPPY. + admin.modifyColumnFamily(TableName.valueOf("t1"), hcd.build());// Submit to the server to modify the attributes of column family f1. + + After the modification, the encoding and compression modes of the existing HFile will take effect after the next compaction. diff --git a/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_dr.rst b/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_dr.rst new file mode 100644 index 0000000..8db3342 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_dr.rst @@ -0,0 +1,301 @@ +:original_name: mrs_01_1609.html + +.. _mrs_01_1609: + +Configuring HBase DR +==================== + +Scenario +-------- + +HBase disaster recovery (DR), a key feature that is used to ensure high availability (HA) of the HBase cluster system, provides the real-time remote DR function for HBase. HBase DR provides basic O&M tools, including tools for maintaining and re-establishing DR relationships, verifying data, and querying data synchronization progress. To implement real-time DR, back up data of an HBase cluster to another HBase cluster. DR in the HBase table common data writing and BulkLoad batch data writing scenarios is supported. + +Prerequisites +------------- + +- The active and standby clusters are successfully installed and started, and you have the administrator permissions on the clusters. + +- Ensure that the network connection between the active and standby clusters is normal and ports are available. +- If the active cluster is deployed in security mode and is not managed by one FusionInsight Manager, cross-cluster trust relationship has been configured for the active and standby clusters.. If the active cluster is deployed in normal mode, no cross-cluster mutual trust is required. +- Cross-cluster replication has been configured for the active and standby clusters. For details, see :ref:`Enabling Cross-Cluster Copy `. +- Time is consistent between the active and standby clusters and the NTP service on the active and standby clusters uses the same time source. +- Mapping relationships between the names of all hosts in the active and standby clusters and IP addresses have been configured in the hosts files of all the nodes in the active and standby clusters and of the node where the active cluster client resides. +- The network bandwidth between the active and standby clusters is determined based on service volume, which cannot be less than the possible maximum service volume. +- The MRS versions of the active and standby clusters must be the same. +- The scale of the standby cluster must be greater than or equal to that of the active cluster. + +Constraints +----------- + +- Although DR provides the real-time data replication function, the data synchronization progress is affected by many factors, such as the service volume in the active cluster and the health status of the standby cluster. In normal cases, the standby cluster should not take over services. In extreme cases, system maintenance personnel and other decision makers determine whether the standby cluster takes over services according to the current data synchronization indicators. + +- HBase clusters must be deployed in active/standby mode. +- Table-level operations on the DR table of the standby cluster are forbidden, such as modifying the table attributes and deleting the table. Misoperations on the standby cluster will cause data synchronization failure of the active cluster. As a result, table data in the standby cluster is lost. +- If the DR data synchronization function is enabled for HBase tables of the active cluster, the DR table structure of the standby cluster needs to be modified to ensure table structure consistency between the active and standby clusters during table structure modification. + +Procedure +--------- + +**Configuring the common data writing DR parameters for the active cluster** + +#. Log in to Manager of the active cluster. + +#. Choose **Cluster** > *Name of the desired cluster* > **Services** > **HBase** > **Configurations** and click **All Configurations**. The HBase configuration page is displayed. + +#. (Optional) :ref:`Table 1 ` describes the optional configuration items during HBase DR. You can set the parameters based on the description or use the default values. + + .. _mrs_01_1609__en-us_topic_0000001173949368_tcc2ebdc7794f4718bf8175f779496069: + + .. table:: **Table 1** Optional configuration items + + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Navigation Path | Parameter | Default Value | Description | + +============================+==============================================+===============+=========================================================================================================================================================================================================================================================================================================================================================+ + | HMaster > Performance | hbase.master.logcleaner.ttl | 600000 | Specifies the retention period of HLog. If the value is set to **604800000** (unit: millisecond), the retention period of HLog is 7 days. | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hbase.master.cleaner.interval | 60000 | Interval for the HMaster to delete historical HLog files. The HLog that exceeds the configured period will be automatically deleted. You are advised to set it to the maximum value to save more HLogs. | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | RegionServer > Replication | replication.source.size.capacity | 16777216 | Maximum size of edits, in bytes. If the edit size exceeds the value, HLog edits will be sent to the standby cluster. | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | replication.source.nb.capacity | 25000 | Maximum number of edits, which is another condition for triggering HLog edits to be sent to the standby cluster. After data in the active cluster is synchronized to the standby cluster, the active cluster reads and sends data in HLog according to this parameter value. This parameter is used together with **replication.source.size.capacity**. | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | replication.source.maxretriesmultiplier | 10 | Maximum number of retries when an exception occurs during replication. | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | replication.source.sleepforretries | 1000 | Retry interval (Unit: ms) | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hbase.regionserver.replication.handler.count | 6 | Number of replication RPC server instances on RegionServer | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +**Configuring the BulkLoad batch data writing DR parameters for the active cluster** + +4. Determine whether to enable the BulkLoad batch data writing DR function. + + If yes, go to :ref:`5 `. + + If no, go to :ref:`8 `. + +5. .. _mrs_01_1609__en-us_topic_0000001173949368_l4716d1d3802e4b24ba3b3b49cf396866: + + Choose **Cluster** > *Name of the desired cluster* > **Services** > **HBase** > **Configurations** and click **All Configurations**. The HBase configuration page is displayed. + +6. Search for **hbase.replication.bulkload.enabled** and change its value to **true** to enable the BulkLoad batch data writing DR function. + +7. Search for **hbase.replication.cluster.id** and change the HBase ID of the active cluster. The ID is used by the standby cluster to connect to the active cluster. The value can contain uppercase letters, lowercase letters, digits, and underscores (_), and cannot exceed 30 characters. + +**Restarting the HBase service and install the client** + +8. .. _mrs_01_1609__en-us_topic_0000001173949368_l3a38ddf2af1b455995b7223d0fe94c23: + + Click **Save**. In the displayed dialog box, click **OK**. Restart the HBase service. + +9. In the active and standby clusters, choose **Cluster >** **Name of the desired cluster** **> Service > HBase > More > Download Client** to download the client and install it. + +**Adding the DR relationship between the active and standby clusters** + +10. Log in as user **hbase** to the HBase shell page of the active cluster. + +11. Run the following command on HBase Shell to create the DR synchronization relationship between the active cluster HBase and the standby cluster HBase. + + **add_peer '**\ *Standby cluster ID*\ **', CLUSTER_KEY => "**\ *ZooKeeper service IP address in the standby cluster* **", CONFIG => {"hbase.regionserver.kerberos.principal" => "**\ *Standby cluster RegionServer principal*\ **", "hbase.master.kerberos.principal" => "**\ *Standby cluster HMaster principal*\ **"}** + + - The standby cluster ID indicates the ID for the active cluster to recognize the standby cluster. Enter an ID. The value can be specified randomly. Digits are recommended. + - The ZooKeeper address of the standby cluster includes the service IP address of ZooKeeper, the port for listening to client connections, and the HBase root directory of the standby cluster on ZooKeeper. + - Search for **hbase.master.kerberos.principal** and **hbase.regionserver.kerberos.principal** in the HBase **hbase-site.xml** configuration file of the standby cluster. + + For example, to add the DR relationship between the active and standby clusters, run the **add_peer '**\ *Standby cluster ID*\ **', CLUSTER_KEY => "192.168.40.2,192.168.40.3,192.168.40.4:24002:/hbase", CONFIG => {"hbase.regionserver.kerberos.principal" => "hbase/hadoop.hadoop.com@HADOOP.COM", "hbase.master.kerberos.principal" => "hbase/hadoop.hadoop.com@HADOOP.COM"}** + +12. (Optional) If the BulkLoad batch data write DR function is enabled, the HBase client configuration of the active cluster must be copied to the standby cluster. + + - Create the **/hbase/replicationConf/**\ **hbase.replication.cluster.id of the active cluster** directory in the HDFS of the standby cluster. + + - HBase client configuration file, which is copied to the **/hbase/replicationConf/hbase.replication.cluster.id of the active cluster** directory of the HDFS of the standby cluster. + + Example: **hdfs dfs -put HBase/hbase/conf/core-site.xml HBase/hbase/conf/hdfs-site.xml HBase/hbase/conf/yarn-site.xml hdfs://NameNode IP:25000/hbase/replicationConf/source_cluster** + +**Enabling HBase DR to synchronize data** + +13. Check whether a naming space exists in the HBase service instance of the standby cluster and the naming space has the same name as the naming space of the HBase table for which the DR function is to be enabled. + + - If the same namespace exists, go to :ref:`14 `. + - If no, create a naming space with the same name in the HBase shell of the standby cluster and go to :ref:`14 `. + +14. .. _mrs_01_1609__en-us_topic_0000001173949368_li254519151517: + + In the HBase shell of the active cluster, run the following command as user **hbase** to enable the real-time DR function for the table data of the active cluster to ensure that the data modified in the active cluster can be synchronized to the standby cluster in real time. + + You can only synchronize the data of one HTable at a time. + + **enable_table_replication '**\ *table name*\ **'** + + .. note:: + + - If the standby cluster does not contain a table with the same name as the table for which real-time synchronization is to be enabled, the table is automatically created. + - If a table with the same name as the table for which real-time synchronization is to be enabled exists in the standby cluster, the structures of the two tables must be the same. + - If the encryption algorithm SMS4 or AES is configured for '*Table name*', the function for synchronizing data from the active cluster to the standby cluster cannot be enabled for the HBase table. + - If the standby cluster is offline or has tables with the same name but different structures, the DR function cannot be enabled. + - If the DR data synchronization function is enabled for some Phoenix tables in the active cluster, the standby cluster cannot have common HBase tables with the same names as the Phoenix tables in the active cluster. Otherwise, the DR function fails to be enabled or the tables with the names in the standby cluster cannot be used properly. + - If the DR data synchronization function is enabled for Phoenix tables in the active cluster, you need to enable the DR data synchronization function for the metadata tables of the Phoenix tables. The metadata tables include SYSTEM.CATALOG, SYSTEM.FUNCTION, SYSTEM.SEQUENCE, and SYSTEM.STATS. + - If the DR data synchronization function is enabled for HBase tables of the active cluster, after adding new indexes to HBase tables, you need to manually add secondary indexes to DR tables in the standby cluster to ensure secondary index consistency between the active and standby clusters. + +15. (Optional) If HBase does not use Ranger, run the following command as user **hbase** in the HBase shell of the active cluster to enable the real-time permission to control data DR function for the HBase tables in the active cluster. + + **enable_table_replication 'hbase:acl'** + +**Creating Users** + +16. Log in to FusionInsight Manager of the standby cluster, choose **System** > **Permission** > **Role** > **Create Role** to create a role, and add the same permission for the standby data table to the role based on the permission of the HBase source data table of the active cluster. +17. Choose **System** > **Permission** > **User** > **Create** to create a user. Set the **User Type** to **Human-Machine** or **Machine-Machine** based on service requirements and add the user to the created role. Access the HBase DR data of the standby cluster as the newly created user. + + .. note:: + + - After the permission of the active HBase source data table is modified, to ensure that the standby cluster can properly read data, modify the role permission for the standby cluster. + - If the current component uses Ranger for permission control, you need to configure permission management policies based on Ranger. For details, see :ref:`Adding a Ranger Access Permission Policy for HBase `. + +**Synchronizing the table data of the active cluster** + +18. After HBase DR is configured and data synchronization is enabled, check whether tables and data exist in the active cluster and whether the historical data needs to be synchronized to the standby cluster. + + - If yes, a table exists and data needs to be synchronized. Log in as the HBase table user to the node where the HBase client of the active cluster is installed and run the kinit username to authenticate the identity. The user must have the read and write permissions on tables and the execute permission on the **hbase:meta** table. Then go to :ref:`19 `. + - If no, no further action is required. + +19. .. _mrs_01_1609__en-us_topic_0000001173949368_li2511113725912: + + The HBase DR configuration does not support automatic synchronization of historical data in tables. You need to back up the historical data of the active cluster and then manually restore the historical data in the standby cluster. + + Manual recovery refers to the recovery of a single table, which can be performed through Export, DistCp, or Import. + + To manually recover a single table, perform the following steps: + + a. Export table data from the active cluster. + + **hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.mapreduce.include.deleted.rows=true** *Table name* *Directory where the source data is stored* + + Example: **hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.mapreduce.include.deleted.rows=true t1 /user/hbase/t1** + + b. Copy the data that has been exported to the standby cluster. + + **hadoop distcp** *directory where the source data is stored on the active cluster* **hdfs://**\ *ActiveNameNodeIP:8020/directory where the source data is stored on the standby cluster* + + **ActiveNameNodeIP** indicates the IP address of the active NameNode in the standby cluster. + + Example: **hadoop distcp /user/hbase/t1 hdfs://192.168.40.2:8020/user/hbase/t1** + + c. Import data to the standby cluster as the HBase table user of the standby cluster. + + On the HBase shell screen of the standby cluster, run the following command as user **hbase** to retain the data writing status: + + **set_clusterState_active** + + The command is run successfully if the following information is displayed: + + .. code-block:: + + hbase(main):001:0> set_clusterState_active + => true + + **hbase org.apache.hadoop.hbase.mapreduce.Import** *-Dimport.bulk.output=Directory where the output data is stored in the standby cluster Table name Directory where the source data is stored in the standby cluster* + + **hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles** *Directory where the output data is stored in the standby cluster Table name* + + Example: + + .. code-block:: + + hbase(main):001:0> set_clusterState_active + => true + + **hbase org.apache.hadoop.hbase.mapreduce.Import -Dimport.bulk.output=/user/hbase/output_t1 t1 /user/hbase/t1** + + **hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/output_t1 t1** + +20. Run the following command on the HBase client to check the synchronized data of the active and standby clusters. After the DR data synchronization function is enabled, you can run this command to check whether the newly synchronized data is consistent. + + **hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --starttime**\ *=Start time* **--endtime**\ *=End time* *Column family name ID of the standby cluster Table name* + + .. note:: + + - The start time must be earlier than the end time. + - The values of **starttime** and **endtime** must be in the timestamp format. You need to run **date -d "2015-09-30 00:00:00" +%s to** change a common time format to a timestamp format. + +**Specify the data writing status for the active and standby clusters.** + +21. On the HBase shell screen of the active cluster, run the following command as user **hbase** to retain the data writing status: + + **set_clusterState_active** + + The command is run successfully if the following information is displayed: + + .. code-block:: + + hbase(main):001:0> set_clusterState_active + => true + +22. On the HBase shell screen of the standby cluster, run the following command as user **hbase** to retain the data read-only status: + + **set_clusterState_standby** + + The command is run successfully if the following information is displayed: + + .. code-block:: + + hbase(main):001:0> set_clusterState_standby + => true + +Related Commands +---------------- + +.. table:: **Table 2** HBase DR + + +---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Operation | Command | Description | + +=================================================================================+================================================================================================================================================================================================================================================================================+=======================================================================================================================================================================================================================================================================================================================+ + | Set up a DR relationship. | add_peer'*Standby cluster ID*', CLUSTER_KEY => "*Standby cluster ZooKeeper service IP address*", CONFIG => {"hbase.regionserver.kerberos.principal" => "*Standby cluster RegionServer principal*", "hbase.master.kerberos.principal" => "*Standby cluster HMaster principal*"} | Set up the relationship between the active cluster and the standby cluster. | + | | | | + | | **add_peer '1','zk1,zk2,zk3:2181:/hbase1'** | If BulkLoad batch data write DR is enabled: | + | | | | + | | 2181: port number of ZooKeeper in the cluster | - Create the **/hbase/replicationConf/hbase.replication.cluster.id of the active cluster** directory in the HDFS of the standby cluster. | + | | | - HBase client configuration file, which is copied to the **/hbase/replicationConf/hbase.replication.cluster.id of the active cluster** directory of the HDFS of the standby cluster. | + +---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Remove the DR relationship. | **remove_peer** *'Standby cluster ID'* | Remove standby cluster information from the active cluster. | + | | | | + | | Example: | | + | | | | + | | **remove_peer '1'** | | + +---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Querying the DR Relationship | **list_peers** | Query standby cluster information (mainly Zookeeper information) in the active cluster. | + +---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Enable the real-time user table synchronization function. | **enable_table_replication** *'Table name'* | Synchronize user tables from the active cluster to the standby cluster. | + | | | | + | | Example: | | + | | | | + | | **enable_table_replication 't1'** | | + +---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Disable the real-time user table synchronization function. | **disable_table_replication** *'Table name'* | Do not synchronize user tables from the active cluster to the standby cluster. | + | | | | + | | Example: | | + | | | | + | | **disable_table_replication 't1'** | | + +---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Verify data of the active and standby clusters. | **bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --starttime=**\ *Start time* **--endtime=**\ *End time* *Column family name Standby cluster ID Table name* | Verify whether data of the specified table is the same between the active cluster and the standby cluster. | + | | | | + | | | The description of the parameters in this command is as follows: | + | | | | + | | | - Start time: If start time is not specified, the default value **0** will be used. | + | | | - End time: If end time is not specified, the time when the current operation is submitted will be used by default. | + | | | - Table name: If a table name is not entered, all user tables for which the real-time synchronization function is enabled will be verified by default. | + +---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Switch the data writing status. | **set_clusterState_active** | Specifies whether data can be written to the cluster HBase tables. | + | | | | + | | **set_clusterState_standby** | | + +---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Add or update the active cluster HDFS configurations saved in the peer cluster. | **hdfs dfs -put -f HBase/hbase/conf/core-site.xml HBase/hbase/conf/hdfs-site.xml HBase/hbase/conf/yarn-site.xml hdfs://**\ *Standby cluster* **NameNode** **IP:PORT/hbase/replicationConf/**\ *Active cluster*\ **hbase.replication.cluster.id** | Enable DR for data including bulkload data. When HDFS parameters are modified in the active cluster, the modification cannot be automatically synchronized from the active cluster to the standby cluster. You need to manually run the command to synchronize configuration. The affected parameters are as follows: | + | | | | + | | | - fs.defaultFS | + | | | - dfs.client.failover.proxy.provider.hacluster | + | | | - dfs.client.failover.connection.retries.on.timeouts | + | | | - dfs.client.failover.connection.retries | + | | | | + | | | For example, change **fs.defaultFS** to **hdfs://hacluster_sale**, | + | | | | + | | | HBase client configuration file, which is copied to the **/hbase/replicationConf/hbase.replication.cluster.id of the active cluster** directory of the HDFS of the standby cluster. | + +---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_replication.rst b/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_replication.rst new file mode 100644 index 0000000..4db0b4e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_replication.rst @@ -0,0 +1,360 @@ +:original_name: mrs_01_0501.html + +.. _mrs_01_0501: + +Configuring HBase Replication +============================= + +Scenario +-------- + +As a key feature to ensure high availability of the HBase cluster system, HBase cluster replication provides HBase with remote data replication in real time. It provides basic O&M tools, including tools for maintaining and re-establishing active/standby relationships, verifying data, and querying data synchronization progress. To achieve real-time data replication, you can replicate data from the HBase cluster to another one. + +Prerequisites +------------- + +- The active and standby clusters have been successfully installed and started (the cluster status is **Running** on the **Active Clusters** page), and you have the administrator rights of the clusters. + +- The network between the active and standby clusters is normal and ports can be used properly. +- Cross-cluster mutual trust has been configured. For details, see `Configuring Cross-Cluster Mutual Trust Relationships `__. +- If historical data exists in the active cluster and needs to be synchronized to the standby cluster, cross-cluster replication must be configured for the active and standby clusters. For details, see :ref:`Enabling Cross-Cluster Copy `. +- Time is consistent between the active and standby clusters and the Network Time Protocol (NTP) service on the active and standby clusters uses the same time source. +- Mapping relationships between the names of all hosts in the active and standby clusters and service IP addresses have been configured in the **/etc/hosts** file by appending **192.***.***.**\* host1** to the **hosts** file. +- The network bandwidth between the active and standby clusters is determined based on service volume, which cannot be less than the possible maximum service volume. + +Constraints +----------- + +- Despite that HBase cluster replication provides the real-time data replication function, the data synchronization progress is determined by several factors, such as the service loads in the active cluster and the health status of processes in the standby cluster. In normal cases, the standby cluster should not take over services. In extreme cases, system maintenance personnel and other decision makers determine whether the standby cluster takes over services according to the current data synchronization indicators. + +- Currently, the replication function supports only one active cluster and one standby cluster in HBase. +- Typically, do not perform operations on data synchronization tables in the standby cluster, such as modifying table properties or deleting tables. If any misoperation on the standby cluster occurs, data synchronization between the active and standby clusters will fail and data of the corresponding table in the standby cluster will be lost. +- If the replication function of HBase tables in the active cluster is enabled for data synchronization, after modifying the structure of a table in the active cluster, you need to manually modify the structure of the corresponding table in the standby cluster to ensure table structure consistency. + +Procedure +--------- + +**Enable the replication function for the active cluster to synchronize data written by Put.** + +#. Log in to the MRS console, click a cluster name and choose **Components**. + +#. .. _mrs_01_0501__en-us_topic_0000001173631336_li1966213718714: + + Go to the **All Configurations** page of the HBase service. For details, see :ref:`Modifying Cluster Service Configuration Parameters `. + + .. note:: + + If the **Components** tab is not displayed on the cluster details page, synchronize the IAM user first. (In the **Dashboard** area of the cluster details page, click **Click to synchronize** on the right of **IAM User Sync** to synchronize IAM users.) + +#. Choose **RegionServer** > **Replication** and check whether the value of **hbase.replication** is **true**. If the value is **false**, set **hbase.replication** to **true**. + +#. (Optional) Set configuration items listed in :ref:`Table 1 `. You can set the parameters based on the description or use the default values. + + .. _mrs_01_0501__en-us_topic_0000001173631336_table6909942154955: + + .. table:: **Table 1** Optional configuration items + + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Navigation Path | Parameter | Default Value | Description | + +============================+==============================================+===============+=========================================================================================================================================================================================================================================================================================================================================================+ + | HMaster > Performance | hbase.master.logcleaner.ttl | 600000 | Time to live (TTL) of HLog files. If the value is set to **604800000** (unit: millisecond), the retention period of HLog is 7 days. | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hbase.master.cleaner.interval | 60000 | Interval for the HMaster to delete historical HLog files. The HLog that exceeds the configured period will be automatically deleted. You are advised to set it to the maximum value to save more HLogs. | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | RegionServer > Replication | replication.source.size.capacity | 16777216 | Maximum size of edits, in bytes. If the edit size exceeds the value, HLog edits will be sent to the standby cluster. | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | replication.source.nb.capacity | 25000 | Maximum number of edits, which is another condition for triggering HLog edits to be sent to the standby cluster. After data in the active cluster is synchronized to the standby cluster, the active cluster reads and sends data in HLog according to this parameter value. This parameter is used together with **replication.source.size.capacity**. | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | replication.source.maxretriesmultiplier | 10 | Maximum number of retries when an exception occurs during replication. | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | replication.source.sleepforretries | 1000 | Retry interval (unit: ms) | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hbase.regionserver.replication.handler.count | 6 | Number of replication RPC server instances on RegionServer | + +----------------------------+----------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +**Enable the replication function for the active cluster to synchronize data written by bulkload.** + +5. .. _mrs_01_0501__en-us_topic_0000001173631336_li65160752154955: + + Determine whether to enable bulkload replication. + + .. note:: + + If bulkload import is used and data needs to be synchronized, you need to enable Bulkload replication. + + If yes, go to :ref:`6 `. + + If no, go to :ref:`10 `. + +6. .. _mrs_01_0501__en-us_topic_0000001173631336_li57688977154955: + + Go to the **All Configurations** page of the HBase service parameters by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +7. On the HBase configuration interface of the active and standby clusters, search for **hbase.replication.cluster.id** and modify it. It specifies the HBase ID of the active and standby clusters. For example, the HBase ID of the active cluster is set to **replication1** and the HBase ID of the standby cluster is set to **replication2** for connecting the active cluster to the standby cluster. To save data overhead, the parameter value length is not recommended to exceed 30. + +8. On the HBase configuration interface of the standby cluster, search for **hbase.replication.conf.dir** and modify it. It specifies the HBase configurations of the active cluster client used by the standby cluster and is used for data replication when the bulkload data replication function is enabled. The parameter value is a path name, for example, **/home**. + + .. note:: + + - When bulkload replication is enabled, you need to manually place the HBase client configuration files (**core-site.xml**, **hdfs-site.xml**, and **hbase-site.xml**) in the active cluster on all RegionServer nodes in the standby cluster. The actual path for placing the configuration file is **${hbase.replication.conf.dir}/${hbase.replication.cluster.id}**. For example, if **hbase.replication.conf.dir** of the standby cluster is set to **/home** and **hbase.replication.cluster.id** of the active cluster is set to **replication1**, the actual path for placing the configuration files in the standby cluster is **/home/replication1**. You also need to change the corresponding directory and file permissions by running the **chown -R omm:wheel /home/replication1** command. + - You can obtain the client configuration files from the client in the active cluster, for example, the **/opt/client/HBase/hbase/conf** path. For details about how to update the configuration file, see `Updating a Client `__. + +9. On the HBase configuration page of the active cluster, search for and change the value of **hbase.replication.bulkload.enabled** to **true** to enable bulkload replication. + +**Restarting the HBase service and install the client** + +10. .. _mrs_01_0501__en-us_topic_0000001173631336_li6210082154955: + + Save the configurations and restart HBase. + +11. .. _mrs_01_0501__en-us_topic_0000001173631336_li11385192216347: + + In the active and standby clusters choose **Cluster** > **Dashboard** > **More** > **Download Client**. For details about how to update the client configuration file, see `Updating a Client `__. + +**Synchronize table data of the active cluster. (Skip this step if the active cluster has no data.)** + +12. .. _mrs_01_0501__en-us_topic_0000001173631336_li12641483154955: + + Access the HBase shell of the active cluster as user **hbase**. + + a. On the active management node where the client has been updated, run the following command to go to the client directory: + + **cd /opt/client** + + b. Run the following command to configure environment variables: + + **source bigdata_env** + + c. If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the current user. If Kerberos authentication is disabled for the current cluster, skip this step. + + **kinit hbase** + + d. Run the following HBase client command: + + **hbase shell** + +13. Check whether historical data exists in the standby cluster. If historical data exists and data in the active and standby clusters must be consistent, delete data from the standby cluster first. + + a. On the HBase shell of the standby cluster, run the **list** command to view the existing tables in the standby cluster. + + b. Delete data tables from the standby cluster based on the output list. + + **disable** '*tableName*' + + **drop** '*tableName*' + +14. After HBase replication is configured and data synchronization is enabled, check whether tables and data exist in the active cluster and whether the historical data needs to be synchronized to the standby cluster. + + Run the **list** command to check the existing tables in the active cluster and run the **scan** '*tableName*\ **'** command to check whether the tables contain historical data. + + - If tables exist and data needs to be synchronized, go to :ref:`15 `. + - If no, no further action is required. + +15. .. _mrs_01_0501__en-us_topic_0000001173631336_li4226821210491: + + The HBase replication configuration does not support automatic synchronization of historical data in tables. You need to back up the historical data of the active cluster and then manually synchronize the historical data to the standby cluster. + + Manual synchronization refers to the synchronization of a single table that is implemented by Export, distcp, and Import. + + The process for manually synchronizing data of a single table is as follows: + + a. Export table data from the active cluster. + + **hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.mapreduce.include.deleted.rows=true** *Table name* *Directory where the source data is stored* + + Example: **hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.mapreduce.include.deleted.rows=true t1 /user/hbase/t1** + + b. Copy the data that has been exported to the standby cluster. + + **hadoop distcp** *Directory for storing source data in the active cluster* **hdfs://**\ *ActiveNameNodeIP*:**9820/** *Directory for storing source data in the standby cluster* + + **ActiveNameNodeIP** indicates the IP address of the active NameNode in the standby cluster. + + Example: **hadoop distcp /user/hbase/t1 hdfs://192.168.40.2:9820/user/hbase/t1** + + c. Import data to the standby cluster as the HBase table user of the standby cluster. + + **hbase org.apache.hadoop.hbase.mapreduce.Import** *-Dimport.bulk.output=Directory where the output data is stored in the standby cluster Table name Directory where the source data is stored in the standby cluster* + + **hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles** *Directory where the output data is stored in the standby cluster Table name* + + For example, **hbase org.apache.hadoop.hbase.mapreduce.Import -Dimport.bulk.output=/user/hbase/output_t1 t1 /user/hbase/t1** and + + **hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/output_t1 t1** + +**Add the replication relationship between the active and standby clusters.** + +16. .. _mrs_01_0501__en-us_topic_0000001173631336_li46664485154955: + + Run the following command on the HBase Shell to create the replication synchronization relationship between the active cluster and the standby cluster: + + **add_peer** '*Standby cluster ID',* *CLUSTER_KEY =>* '*ZooKeeper address of the standby cluster*',\ **{HDFS_CONFS => true}** + + - *Standby cluster ID* indicates an ID for the active cluster to recognize the standby cluster. It is recommended that the ID contain letters and digits. + + - The ZooKeeper address of the standby cluster includes the service IP address of ZooKeeper, the port for listening to client connections, and the HBase root directory of the standby cluster on ZooKeeper. + + - **{HDFS_CONFS => true}** indicates that the default HDFS configuration of the active cluster will be synchronized to the standby cluster. This parameter is used for HBase of the standby cluster to access HDFS of the active cluster. If bulkload replication is disabled, you do not need to use this parameter. + + Suppose the standby cluster ID is replication2 and the ZooKeeper address of the standby cluster is **192.168.40.2,192.168.40.3,192.168.40.4:2181:/hbase**. + + - Run the **add_peer** **'replication2',\ CLUSTER_KEY =>** **'192.168.40.2,192.168.40.3,192.168.40.4:2181:/hbase',CONFIG => { "hbase.regionserver.kerberos.principal" => "", "hbase.master.kerberos.principal" => "" }** command for a security cluster and the **add_peer** **'replication2',\ CLUSTER_KEY =>** **'192.168.40.2,192.168.40.3,192.168.40.4:2181:/hbase'** command for a common cluster. + + The **hbase.master.kerberos.principal** and **hbase.regionserver.kerberos.principal** parameters are the Kerberos users of HBase in the security cluster. You can search the **hbase-site.xml** file on the client for the parameter values. For example, if the client is installed in the **/opt/client** directory of the Master node, you can run the **grep "kerberos.principal" /opt/client/HBase/hbase/conf/hbase-site.xml -A1** command to obtain the principal of HBase. See the following figure. + + + .. figure:: /_static/images/en-us_image_0000001295899984.png + :alt: **Figure 1** Obtaining the principal of HBase + + **Figure 1** Obtaining the principal of HBase + + .. note:: + + a. Obtain the ZooKeeper service IP address. + + Log in to the MRS console, click the cluster name, and choose **Components** > **ZooKeeper** > **Instances** to obtain the ZooKeeper service IP address. + + b. On the ZooKeeper service parameter configuration page, search for clientPort, which is the port for the client to connect to the server. + + c. Run the **list_peers** command to check whether the replication relationship between the active and standby clusters is added. If the following information is displayed, the relationship is successfully added. + + .. code-block:: + + hbase(main):003:0> list_peers + PEER_ID CLUSTER_KEY ENDPOINT_CLASSNAME STATE REPLICATE_ALL NAMESPACES TABLE_CFS BANDWIDTH SERIAL + replication2 192.168.0.13,192.168.0.177,192.168.0.25:2181:/hbase ENABLED true 0 false + +**Specify the data writing status for the active and standby clusters.** + +17. On the HBase shell of the active cluster, run the following command to retain the data writing status: + + **set_clusterState_active** + + The command is run successfully if the following information is displayed: + + .. code-block:: + + hbase(main):001:0> set_clusterState_active + => true + +18. On the HBase shell of the standby cluster, run the following command to retain the data read-only status: + + **set_clusterState_standby** + + The command is run successfully if the following information is displayed: + + .. code-block:: + + hbase(main):001:0> set_clusterState_standby + => true + +**Enable the HBase replication function to synchronize data.** + +19. Check whether a namespace exists in the HBase service instance of the standby cluster and the namespace has the same name as the namespace of the HBase table for which the replication function is to be enabled. + + On the HBase shell of the standby cluster, run the **list_namespace** command to query the namespace. + + - If the same namespace exists, go to :ref:`20 `. + + - If the same namespace does not exist, on the HBase shell of the standby cluster, run the following command to create a namespace with the same name and go to :ref:`20 `: + + **create_namespace'ns1** + +20. .. _mrs_01_0501__en-us_topic_0000001173631336_li15192291154955: + + On the HBase shell of the active cluster, run the following command to enable real-time replication for tables in the active cluster. This ensures that modified data in the active cluster can be synchronized to the standby cluster in real time. + + You can only synchronize data of one HTable at one time. + + **enable_table_replication '**\ *Table name*' + + .. note:: + + - If the standby cluster does not contain a table with the same name as the table for which real-time synchronization is to be enabled, the table is automatically created. + + - If a table with the same name as the table for which real-time synchronization is to be enabled exists in the standby cluster, the structures of the two tables must be the same. + + - If the encryption algorithm SMS4 or AES is configured for '*Table name*', the function for synchronizing data from the active cluster to the standby cluster cannot be enabled for the HBase table. + + - If the standby cluster is offline or has tables with the same name but different structures, the replication function cannot be enabled. + + If the standby cluster is offline, start it. + + If the standby cluster has a table with the same name but different structure, modify the table structure to make it as the same as the table structure of the active cluster. On the HBase shell of the standby cluster, run the **alter** command to change the password by referring to the example. + +21. .. _mrs_01_0501__en-us_topic_0000001173631336_li3638114154955: + + On the HBase shell of the active cluster, run the following command to enable the real-time replication function for the active cluster to synchronize the HBase permission table: + + **enable_table_replication 'hbase:acl'** + + .. note:: + + After the permission of the active HBase source data table is modified, to ensure that the standby cluster can properly read data, modify the role permission for the standby cluster. + +**Check the data synchronization status for the active and standby clusters.** + +22. Run the following command on the HBase client to check the synchronized data of the active and standby clusters. After the replication function is enabled, you can run this command to check whether the newly synchronized data is consistent. + + **hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --starttime**\ *=Start time* **--endtime**\ *=End time* *Column family name ID of the standby cluster Table name* + + .. note:: + + - The start time must be earlier than the end time. + - The value of **starttime** and **endtime** must be in the timestamp format. You need to run **date -d "2015-09-30 00:00:00" +%s to** change a common time format to a timestamp format. The command output is a 10-digit number (accurate to second), but HBase identifies a 13-digit number (accurate to millisecond). Therefore, you need to add three zeros (000) to the end of the command output. + + **Switch over active and standby clusters.** + + .. note:: + + a. If the standby cluster needs to be switched over to the active cluster, reconfigure the active/standby relationship by referring to :ref:`2 ` to :ref:`11 ` and :ref:`16 ` to :ref:`21 `. + b. Do not perform :ref:`12 ` to :ref:`15 `. + +Related Commands +---------------- + +.. table:: **Table 2** HBase replication + + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Operation | Command | Description | + +============================================================+========================================================================================================================================================+===============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | Set up the active/standby relationship. | **add_peer** *'Standby cluster ID',CLUSTER_KEY =>'Standby cluster address'* | Set up the relationship between the active cluster and the standby cluster. To enable bulkload replication, run the **add_peer** *'Standby cluster ID'*\ **,\ CLUSTER_KEY =>** *'Standby cluster address'* command, configure **hbase.replication.conf.dir**, and manually copy the HBase client configuration file in the active cluster to all RegionServer nodes in the standby cluster. For details, see :ref:`5 ` to :ref:`11 `. | + | | | | + | | Examples: | | + | | | | + | | **add_peer '1',CLUSTER_KEY =>** **'zk1,zk2,zk3:2181:/hbase'** | | + | | | | + | | **add_peer '1',CLUSTER_KEY =>** **'zk1,zk2,zk3:2181:/hbase1'** | | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Remove the active/standby relationship. | **remove_peer** *'Standby cluster ID'* | Remove standby cluster information from the active cluster. | + | | | | + | | Example: | | + | | | | + | | **remove_peer '1'** | | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Query the active/standby relationship. | **list_peers** | Query standby cluster information (mainly Zookeeper information) in the active cluster. | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Enable the real-time user table synchronization function. | **enable_table_replication** *'Table name'* | Synchronize user tables from the active cluster to the standby cluster. | + | | | | + | | Example: | | + | | | | + | | **enable_table_replication 't1'** | | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Disable the real-time user table synchronization function. | **disable_table_replication** *'Table name'* | Do not synchronize user tables from the active cluster to the standby cluster. | + | | | | + | | Example: | | + | | | | + | | **disable_table_replication 't1'** | | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Verify data of the active and standby clusters. | **bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication** *--starttime --endtime Column family name Standby cluster ID Table name* | Verify whether data of the specified table is the same between the active cluster and the standby cluster. | + | | | | + | | | The description of the parameters in this command is as follows: | + | | | | + | | | - Start time: If start time is not specified, the default value **0** will be used. | + | | | - End time: If end time is not specified, the time when the current operation is submitted will be used by default. | + | | | - Table name: If a table name is not entered, all user tables for which the real-time synchronization function is enabled will be verified by default. | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Switch the data writing status. | **set_clusterState_active** | Specifies whether data can be written to the cluster HBase tables. | + | | | | + | | **set_clusterState_standby** | | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/configuring_region_in_transition_recovery_chore_service.rst b/doc/component-operation-guide-lts/source/using_hbase/configuring_region_in_transition_recovery_chore_service.rst new file mode 100644 index 0000000..78ff4ae --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/configuring_region_in_transition_recovery_chore_service.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_1010.html + +.. _mrs_01_1010: + +Configuring Region In Transition Recovery Chore Service +======================================================= + +Scenario +-------- + +In a faulty environment, there are possibilities that a region may be stuck in transition for longer duration due to various reasons like slow region server response, unstable network, ZooKeeper node version mismatch. During region transition, client operation may not work properly as some regions will not be available. + +Configuration +------------- + +A chore service should be scheduled at HMaster to identify and recover regions that stay in the transition state for a long time. + +The following table describes the parameters for enabling this function. + +.. table:: **Table 1** Parameters + + +-----------------------------------------------+-----------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Default Value | + +===============================================+===============================================================================================+===============+ + | hbase.region.assignment.auto.recovery.enabled | Configuration parameter used to enable/disable the region assignment recovery thread feature. | true | + +-----------------------------------------------+-----------------------------------------------------------------------------------------------+---------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/configuring_secure_hbase_replication.rst b/doc/component-operation-guide-lts/source/using_hbase/configuring_secure_hbase_replication.rst new file mode 100644 index 0000000..839f7c0 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/configuring_secure_hbase_replication.rst @@ -0,0 +1,69 @@ +:original_name: mrs_01_1009.html + +.. _mrs_01_1009: + +Configuring Secure HBase Replication +==================================== + +Scenario +-------- + +This topic provides the procedure to configure the secure HBase replication during cross-realm Kerberos setup in security mode. + +Prerequisites +------------- + +- Mapping for all the FQDNs to their realms should be defined in the Kerberos configuration file. +- The passwords and keytab files of **ONE.COM** and **TWO.COM** must be the same. + +Procedure +--------- + +#. Create krbtgt principals for the two realms. + + For example, if you have two realms called **ONE.COM** and **TWO.COM**, you need to add the following principals: **krbtgt/ONE.COM@TWO.COM** and **krbtgt/TWO.COM@ONE.COM**. + + Add these two principals at both realms. + + .. code-block:: + + kadmin: addprinc -e "" krbtgt/ONE.COM@TWO.COM + kadmin: addprinc -e "" krbtgt/TWO.COM@ONE.COM + + .. note:: + + There must be at least one common keytab mode between these two realms. + +#. Add rules for creating short names in Zookeeper. + + **Dzookeeper.security.auth_to_local** is a parameter of the ZooKeeper server process. Following is an example rule that illustrates how to add support for the realm called **ONE.COM**. The principal has two members (such as **service/instance@ONE.COM**). + + .. code-block:: + + Dzookeeper.security.auth_to_local=RULE:[2:\$1@\$0](.*@\\QONE.COM\\E$)s/@\\QONE.COM\\E$//DEFAULT + + The above code example adds support for the **ONE.COM** realm in a different realm. Therefore, in the case of replication, you must add a rule for the master cluster realm in the slave cluster realm. **DEFAULT** is for defining the default rule. + +#. Add rules for creating short names in the Hadoop processes. + + The following is the **hadoop.security.auth_to_local** property in the **core-site.xml** file in the slave cluster HBase processes. For example, to add support for the **ONE.COM** realm: + + .. code-block:: + + + hadoop.security.auth_to_local + RULE:[2:$1@$0](.*@\QONE.COM\E$)s/@\QONE.COM\E$//DEFAULT + + + .. note:: + + If replication for bulkload data is enabled, then the same property for supporting the slave realm needs to be added in the **core-site.xml** file in the master cluster HBase processes. + + Example: + + .. code-block:: + + + hadoop.security.auth_to_local + RULE:[2:$1@$0](.*@\QTWO.COM\E$)s/@\QTWO.COM\E$//DEFAULT + diff --git a/doc/component-operation-guide-lts/source/using_hbase/configuring_the_mob.rst b/doc/component-operation-guide-lts/source/using_hbase/configuring_the_mob.rst new file mode 100644 index 0000000..daf0667 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/configuring_the_mob.rst @@ -0,0 +1,76 @@ +:original_name: mrs_01_1631.html + +.. _mrs_01_1631: + +Configuring the MOB +=================== + +Scenario +-------- + +In the actual application scenario, data in various sizes needs to be stored, for example, image data and documents. Data whose size is smaller than 10 MB can be stored in HBase. HBase can yield the best read-and-write performance for data whose size is smaller than 100 KB. If the size of data stored in HBase is greater than 100 KB or even reaches 10 MB and the same number of data files are inserted, the total data amount is large, causing frequent compaction and split, high CPU consumption, high disk I/O frequency, and low performance. + +MOB data (100 KB to 10 MB data) is stored in a file system (such as the HDFS) in the HFile format. Files are centrally managed using the expiredMobFileCleaner and Sweeper tools. The addresses and size of files are stored in the HBase store as values. This greatly decreases the compaction and split frequency in HBase and improves performance. + +The MOB function of HBase is enabled by default. For details about related configuration items, see :ref:`Table 1 `. To use the MOB function, you need to specify the MOB mode for storing data in the specified column family when creating a table or modifying table attributes. + +Configuration Description +------------------------- + +To enable the HBase MOB function, you need to specify the MOB mode for storing data in the specified column family when creating a table or modifying table attributes. + +Use code to declare that the MOB mode for storing data is used: + +.. code-block:: + + HColumnDescriptor hcd = new HColumnDescriptor("f"); + hcd.setMobEnabled(true); + +Use code to declare that the MOB mode for storing data is used, the unit of MOB_THRESHOLD is byte: + +.. code-block:: + + hbase(main):009:0> create 't3',{NAME => 'd', MOB_THRESHOLD => '102400', IS_MOB => 'true'} + + 0 row(s) in 0.3450 seconds + + => Hbase::Table - t3 + hbase(main):010:0> describe 't3' + Table t3 is ENABLED + + + t3 + + + COLUMN FAMILIES DESCRIPTION + + + {NAME => 'd', MOB_THRESHOLD => '102400', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', + TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', + IN_MEMORY => 'false', IS_MOB => 'true', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} + + 1 row(s) in 0.0170 seconds + +**Navigation path for setting parameters:** + +On FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **HBase** > **Configurations** > **All Configurations**. Enter a parameter name in the search box. + +.. _mrs_01_1631__en-us_topic_0000001219230677_t19912a272c204245856b9698b6f04877: + +.. table:: **Table 1** Parameter description + + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +=====================================+===================================================================================================================================================================================================================================================================================================================================================+=======================+ + | hbase.mob.file.cache.size | Size of the opened file handle cache. If this parameter is set to a large value, more file handles can be cached, reducing the frequency of opening and closing files. However, if this parameter is set to a large value, too many file handles will be opened. The default value is **1000**. This parameter is configured on the ResionServer. | 1000 | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | hbase.mob.cache.evict.period | Expiration time of cached MOB files in the MOB cache, in seconds. | 3600 | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | hbase.mob.cache.evict.remain.ratio | Ratio of the number of retained files after MOB cache reclamation to the number of cached files. **hbase.mob.cache.evict.remain.ratio** is an algorithm factor. When the number of cached MOB files reaches the product of **hbase.mob.file.cache.size** **hbase.mob.cache.evict.remain.ratio**, cache reclamation is triggered. | 0.5 | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | hbase.master.mob.ttl.cleaner.period | Interval for deleting expired files, in seconds. The default value is one day (86,400 seconds). | 86400 | + | | | | + | | .. note:: | | + | | | | + | | If the validity period of an MOB file expires, that is, the file has been created for more than 24 hours, the MOB file will be deleted by the tool for deleting expired MOB files. | | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/creating_hbase_roles.rst b/doc/component-operation-guide-lts/source/using_hbase/creating_hbase_roles.rst new file mode 100644 index 0000000..01fa20a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/creating_hbase_roles.rst @@ -0,0 +1,83 @@ +:original_name: mrs_01_1608.html + +.. _mrs_01_1608: + +Creating HBase Roles +==================== + +Scenario +-------- + +This section guides the system administrator to create and configure an HBase role on Manager. The HBase role can set HBase administrator permissions and read (R), write (W), create (C), execute (X), or manage (A) permissions for HBase tables and column families. + +Users can create a table, query/delete/insert/update data, and authorize others to access HBase tables after they set the corresponding permissions for the specified databases or tables on HDFS. + +.. note:: + + - HBase roles can be created in security mode, but cannot be created in normal mode. + - If the current component uses Ranger for permission control, you need to configure related policies based on Ranger for permission management. For details, see :ref:`Adding a Ranger Access Permission Policy for HBase `. + +Prerequisites +------------- + +- The system administrator has understood the service requirements. + +- You have logged in to Manager. + +Procedure +--------- + +#. On Manager, choose **System** > **Permission** > **Role**. + +#. On the displayed page, click **Create Role** and enter a **Role Name** and **Description**. + +#. Set **Permission**. For details, see :ref:`Table 1 `. + + HBase permissions: + + - HBase Scope: Authorizes HBase tables. The minimum permission is read (R) and write (W) for columns. + - HBase administrator permission: HBase administrator permissions. + + .. note:: + + Users have the read (R), write (W), create (C), execute (X), and administrate (A) permissions for the tables created by themselves. + + .. _mrs_01_1608__en-us_topic_0000001173470652_t873a9c44357b40cd98cb948ce9438d93: + + .. table:: **Table 1** Setting a role + + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Task | Role Authorization | + +=========================================================================+=========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | Setting the HBase administrator permission | In **Configure Resource Permission**, choose *Name of the desired cluster* > **HBase** and select **HBase Administrator Permission**. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for users to create tables | a. In **Configure Resource Permission**, choose *Name of the desired cluster* > **HBase** > **HBase Scope**. | + | | b. Click **global**. | + | | c. In the **Permission** column of the specified namespace, select **Create** and **Execute**. For example, select **Create** and **Execute** for the default namespace **default**. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for users to write data to tables | a. In **Configure Resource Permission**, choose *Name of the desired cluster* > **HBase** > **HBase Scope** > **global**. | + | | b. In the **Permission** column of the specified namespace, select **Write**. For example, select **Write** for the default namespace **default**. By default, HBase sub-objects inherit the permission from the parent object. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for users to read data from tables | a. In **Configure Resource Permission**, choose *Name of the desired cluster* > **HBase** > **HBase Scope** > **global**. | + | | b. In the **Permission** column of the specified namespace, select **Read**. For example, select **Read** for the default namespace **default**. By default, HBase sub-objects inherit the permission from the parent object. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for users to manage namespaces or tables | a. In **Configure Resource Permission**, choose *Name of the desired cluster* > **HBase** > **HBase Scope** > **global**. | + | | b. In the **Permission** column of the specified namespace, select **Manage**. For example, select **Manage** for the default namespace **default**. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for reading data from or writing data to columns | a. In **Configure Resource Permission**, select *Name of the desired cluster* > **HBase** > **HBase Scope** > **global** and click the specified namespace to display the tables in the namespace. | + | | | + | | b. Click a table. | + | | | + | | c. Click a column family. | + | | | + | | d. Confirm whether you want to create a role? | + | | | + | | - If yes, enter the column name in the **Resource Name** text box. Use commas (,) to separate multiple columns. Select **Read** or **Write**. If there are no columns with the same name in the HBase table, a newly created column with the same name as the existing column has the same permission as the existing one. The column permission is set successfully. | + | | - If no, modify the column permission of the existing HBase role. The columns for which the permission has been separately set are displayed in the table. Go to :ref:`5 `. | + | | | + | | e. .. _mrs_01_1608__en-us_topic_0000001173470652_lc2f15302f1854175993f36524c25bf26: | + | | | + | | To add column permissions for a role, enter the column name in the **Resource Name** text box and set the column permissions. To modify column permissions for a role, enter the column name in the **Resource Name** text box and set the column permissions. Alternatively, you can directly modify the column permissions in the table. If the column permissions are modified in the table and column permissions with the same name are added, the settings cannot be saved. You are advised to modify the column permission of a role directly in the table. The search function is supported. | + +-------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Click **OK**, and return to the **Role** page. diff --git a/doc/component-operation-guide-lts/source/using_hbase/enabling_cross-cluster_copy.rst b/doc/component-operation-guide-lts/source/using_hbase/enabling_cross-cluster_copy.rst new file mode 100644 index 0000000..7cf7ea4 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/enabling_cross-cluster_copy.rst @@ -0,0 +1,56 @@ +:original_name: mrs_01_0502.html + +.. _mrs_01_0502: + +Enabling Cross-Cluster Copy +=========================== + +Scenario +-------- + +DistCp is used to copy the data stored on HDFS from a cluster to another cluster. DistCp depends on the cross-cluster copy function, which is disabled by default. This function needs to be enabled in both clusters. + +This section describes how to enable cross-cluster copy. + +Impact on the System +-------------------- + +Yarn needs to be restarted to enable the cross-cluster copy function and cannot be accessed during the restart. + +Prerequisites +------------- + +The **hadoop.rpc.protection** parameter of the two HDFS clusters must be set to the same data transmission mode, which can be **privacy** (encryption enabled) or **authentication** (encryption disabled). + +.. note:: + + Go to the **All Configurations** page by referring to :ref:`Modifying Cluster Service Configuration Parameters ` and search for **hadoop.rpc.protection**. + +Procedure +--------- + +#. Go to the **All Configurations** page of the Yarn service. For details, see :ref:`Modifying Cluster Service Configuration Parameters `. + + .. note:: + + If the **Components** tab is not displayed on the cluster details page, complete IAM user synchronization first. (In the **Dashboard** area of the cluster details page, click **Click to synchronize** on the right of **IAM User Sync** to synchronize IAM users.) + +#. In the navigation pane, choose **Yarn** > **Distcp**. + +#. Set **haclusterX.remotenn1** of **dfs.namenode.rpc-address** to the service IP address and RPC port number of one NameNode instance of the peer cluster, and set **haclusterX.remotenn2** to the service IP address and RPC port number of the other NameNode instance of the peer cluster. Enter a value in the *IP address:port* format. + + .. note:: + + For MRS 1.9.2 or later, log in to the MRS console, click the cluster name, and choose **Components** > **HDFS** > **Instances** to obtain the service IP address of the NameNode instance. + + You can also log in to FusionInsight Manager, and choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **Instance** to obtain the service IP address of the NameNode instance. + + **dfs.namenode.rpc-address.haclusterX.remotenn1** and **dfs.namenode.rpc-address.haclusterX.remotenn2** do not distinguish active and standby NameNode instances. The default NameNode RPC port is 9820 and cannot be modified on MRS Manager. + + For example, **10.1.1.1:9820** and **10.1.1.2:9820**. + +#. Save the configuration. On the **Dashboard** tab page, and choose **More** > **Restart Service** to restart the Yarn service. + + **Operation succeeded** is displayed. Click **Finish**. The Yarn service is started successfully. + +#. Log in to the other cluster and repeat the preceding operations. diff --git a/doc/component-operation-guide-lts/source/using_hbase/hbase_log_overview.rst b/doc/component-operation-guide-lts/source/using_hbase/hbase_log_overview.rst new file mode 100644 index 0000000..0043aef --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/hbase_log_overview.rst @@ -0,0 +1,98 @@ +:original_name: mrs_01_1056.html + +.. _mrs_01_1056: + +HBase Log Overview +================== + +Log Description +--------------- + +**Log path**: The default storage path of HBase logs is **/var/log/Bigdata/hbase/**\ *Role name*. + +- HMaster: **/var/log/Bigdata/hbase/hm** (run logs) and **/var/log/Bigdata/audit/hbase/hm** (audit logs) +- RegionServer: **/var/log/Bigdata/hbase/rs** (run logs) and **/var/log/Bigdata/audit/hbase/rs** (audit logs) +- ThriftServer: **/var/log/Bigdata/hbase/ts2** (run logs, **ts2** is the instance name) and **/var/log/Bigdata/audit/hbase/ts2** (audit logs, **ts2** is the instance name) + +**Log archive rule**: The automatic log compression and archiving function of HBase is enabled. By default, when the size of a log file exceeds 30 MB, the log file is automatically compressed. The naming rule of a compressed log file is as follows: <*Original log name*>-<*yyyy-mm-dd_hh-mm-ss*>.[*ID*].\ **log.zip** A maximum of 20 latest compressed files are reserved. The number of compressed files can be configured on the Manager portal. + +.. table:: **Table 1** HBase log list + + +------------+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | Type | Name | Description | + +============+================================================+===============================================================================================================================+ + | Run logs | hbase---.log | HBase system log that records the startup time, startup parameters, and most logs generated when the HBase system is running. | + +------------+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | hbase---.out | Log that records the HBase running environment information. | + +------------+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | ----gc.log | Log that records HBase junk collections. | + +------------+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | checkServiceDetail.log | Log that records whether the HBase service starts successfully. | + +------------+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | hbase.log | Log generated when the HBase service health check script and some alarm check scripts are executed. | + +------------+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | sendAlarm.log | Log that records alarms reported after execution of HBase alarm check scripts. | + +------------+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | hbase-haCheck.log | Log that records the active and standby status of HMaster | + +------------+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | stop.log | Log that records the startup and stop processes of HBase. | + +------------+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | Audit logs | hbase-audit-.log | Log that records HBase security audit. | + +------------+------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + +Log Level +--------- + +:ref:`Table 2 ` describes the log levels supported by HBase. The priorities of log levels are FATAL, ERROR, WARN, INFO, and DEBUG in descending order. Logs whose levels are higher than or equal to the specified level are printed. The number of printed logs decreases as the specified log level increases. + +.. _mrs_01_1056__en-us_topic_0000001173631374_tbb25f33f364f4d2d8e14cd48d9f8dd0b: + +.. table:: **Table 2** Log levels + + +-------+------------------------------------------------------------------------------------------------------------------------------------------+ + | Level | Description | + +=======+==========================================================================================================================================+ + | FATAL | Logs of this level record fatal error information about the current event processing that may result in a system crash. | + +-------+------------------------------------------------------------------------------------------------------------------------------------------+ + | ERROR | Logs of this level record error information about the current event processing, which indicates that system running is abnormal. | + +-------+------------------------------------------------------------------------------------------------------------------------------------------+ + | WARN | Logs of this level record abnormal information about the current event processing. These abnormalities will not result in system faults. | + +-------+------------------------------------------------------------------------------------------------------------------------------------------+ + | INFO | Logs of this level record normal running status information about the system and events. | + +-------+------------------------------------------------------------------------------------------------------------------------------------------+ + | DEBUG | Logs of this level record the system information and system debugging information. | + +-------+------------------------------------------------------------------------------------------------------------------------------------------+ + +To modify log levels, perform the following operations: + +#. Go to the **All Configurations** page of the HBase service. For details, see :ref:`Modifying Cluster Service Configuration Parameters `. +#. On the left menu bar, select the log menu of the target role. +#. Select a desired log level. +#. Save the configuration. In the displayed dialog box, click **OK** to make the configurations take effect. + + .. note:: + + The configurations take effect immediately without the need to restart the service. + +Log Formats +----------- + +The following table lists the HBase log formats. + +.. table:: **Table 3** Log formats + + +------------+--------------+------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Component | Format | Example | + +============+==============+==============================================================================================================================+======================================================================================================================================================================================================================+ + | Run logs | HMaster | ||<*Thread that generates the log*>|<*Message in the log*>|<*Location of the log event*> | 2020-01-19 16:04:53,558 \| INFO \| main \| env:HBASE_THRIFT_OPTS= \| org.apache.hadoop.hbase.util.ServerCommandLine.logProcessInfo(ServerCommandLine.java:113) | + +------------+--------------+------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | RegionServer | ||<*Thread that generates the log*>|<*Message in the log*>|<*Location of the log event*> | 2020-01-19 16:05:18,589 \| INFO \| regionserver16020-SendThread(linux-k6da:2181) \| Client will use GSSAPI as SASL mechanism. \| org.apache.zookeeper.client.ZooKeeperSaslClient$1.run(ZooKeeperSaslClient.java:285) | + +------------+--------------+------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | ThriftServer | ||<*Thread that generates the log*>|<*Message in the log*>|<*Location of the log event*> | 2020-02-16 09:42:55,371 \| INFO \| main \| loaded properties from hadoop-metrics2.properties \| org.apache.hadoop.metrics2.impl.MetricsConfig.loadFirst(MetricsConfig.java:111) | + +------------+--------------+------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Audit logs | HMaster | ||<*Thread that generates the log*>|<*Message in the log*>|<*Location of the log event*> | 2020-02-16 09:42:40,934 \| INFO \| master:linux-k6da:16000 \| Master: [master:linux-k6da:16000] start operation called. \| org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:581) | + +------------+--------------+------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | RegionServer | ||<*Thread that generates the log*>|<*Message in the log*>|<*Location of the log event*> | 2020-02-16 09:42:51,063 \| INFO \| main \| RegionServer: [regionserver16020] start operation called. \| org.apache.hadoop.hbase.regionserver.HRegionServer.startRegionServer(HRegionServer.java:2396) | + +------------+--------------+------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | ThriftServer | ||<*Thread that generates the log*>|<*Message in the log*>|<*Location of the log event*> | 2020-02-16 09:42:55,512 \| INFO \| main \| thrift2 server start operation called. \| org.apache.hadoop.hbase.thrift2.ThriftServer.main(ThriftServer.java:421) | + +------------+--------------+------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_put_performance.rst b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_put_performance.rst new file mode 100644 index 0000000..4124b65 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_put_performance.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_1637.html + +.. _mrs_01_1637: + +Improving Put Performance +========================= + +Scenario +-------- + +In the scenario where a large number of requests are continuously put, setting the following two parameters to **false** can greatly improve the Put performance. + +- **hbase.regionserver.wal.durable.sync** + +- **hbase.regionserver.hfile.durable.sync** + +When the performance is improved, there is a low probability that data is lost if three DataNodes are faulty at the same time. Exercise caution when configuring the parameters in scenarios that have high requirements on data reliability. + +Procedure +--------- + +Navigation path for setting parameters: + +On FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **HBase** > **Configurations** > **All Configurations**. Enter the parameter name in the search box, and change the value. + +.. table:: **Table 1** Parameters for improving put performance + + +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ + | Parameter | Description | Value | + +===================+=====================================================================================================================================================================================================================================+=======+ + | hbase.wal.hsync | Specifies whether to enable WAL file durability to make the WAL data persistence on disks. If this parameter is set to **true**, the performance is affected because each WAL file is synchronized to the disk by the Hadoop fsync. | false | + +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ + | hbase.hfile.hsync | Specifies whether to enable the HFile durability to make data persistence on disks. If this parameter is set to true, the performance is affected because each Hfile file is synchronized to the disk by the Hadoop fsync. | false | + +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_real-time_data_read_performance.rst b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_real-time_data_read_performance.rst new file mode 100644 index 0000000..4cc4026 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_real-time_data_read_performance.rst @@ -0,0 +1,80 @@ +:original_name: mrs_01_1018.html + +.. _mrs_01_1018: + +Improving Real-time Data Read Performance +========================================= + +Scenario +-------- + +HBase data needs to be read. + +Prerequisites +------------- + +The get or scan interface of HBase has been invoked and data is read in real time from HBase. + +Procedure +--------- + +- **Data reading server tuning** + + Parameter portal: + + Go to the **All Configurations** page of the HBase service. For details, see :ref:`Modifying Cluster Service Configuration Parameters `. + + .. table:: **Table 1** Configuration items that affect real-time data reading + + +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Default Value | + +==================================+=================================================================================================================================================================================================================================================================================================================================================================+============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | GC_OPTS | You can increase HBase memory to improve HBase performance because read and write operations are performed in HBase memory. | - HMaster | + | | | | + | | **HeapSize** and **NewSize** need to be adjusted. When you adjust **HeapSize**, set **Xms** and **Xmx** to the same value to avoid performance problems when JVM dynamically adjusts **HeapSize**. Set **NewSize** to 1/8 of **HeapSize**. | -server -Xms4G -Xmx4G -XX:NewSize=512M -XX:MaxNewSize=512M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M | + | | | | + | | - **HMaster**: If HBase clusters enlarge and the number of Regions grows, properly increase the **GC_OPTS** parameter value of the HMaster. | - Region Server | + | | - **RegionServer**: A RegionServer needs more memory than an HMaster. If sufficient memory is available, increase the **HeapSize** value. | | + | | | -server -Xms6G -Xmx6G -XX:NewSize=1024M -XX:MaxNewSize=1024M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M | + | | .. note:: | | + | | | | + | | When the value of **HeapSize** for the active HMaster is 4 GB, the HBase cluster can support 100,000 regions. Empirically, each time 35,000 regions are added to the cluster, the value of **HeapSize** must be increased by 2 GB. It is recommended that the value of **HeapSize** for the active HMaster not exceed 32 GB. | | + +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.regionserver.handler.count | Indicates the number of requests that RegionServer can process concurrently. If the parameter is set to an excessively large value, threads will compete fiercely. If the parameter is set to an excessively small value, requests will be waiting for a long time in RegionServer, reducing the processing capability. You can add threads based on resources. | 200 | + | | | | + | | It is recommended that the value be set to 100 to 300 based on the CPU usage. | | + +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hfile.block.cache.size | HBase cache sizes affect query efficiency. Set cache sizes based on query modes and query record distribution. If random query is used to reduce the hit ratio of the buffer, you can reduce the buffer size. | When **offheap** is disabled, the default value is **0.25**. When **offheap** is enabled, the default value is **0.1**. | + +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. note:: + + If read and write operations are performed at the same time, the performance of the two operations affects each other. If flush and compaction operations are frequently performed due to data writes, a large number of disk I/O operations are occupied, affecting read performance. If a large number of compaction operations are blocked due to write operations, multiple HFiles exist in the region, affecting read performance. Therefore, if the read performance is unsatisfactory, you need to check whether the write configurations are proper. + +- **Data reading client tuning** + + When scanning data, you need to set **caching** (the number of records read from the server at a time. The default value is **1**.). If the default value is used, the read performance will be extremely low. + + If you do not need to read all columns of a piece of data, specify the columns to be read to reduce network I/O. + + If you only need to read the row key, add a filter (FirstKeyOnlyFilter or KeyOnlyFilter) that only reads the row key. + +- **Data table reading design optimization** + + .. table:: **Table 2** Parameters affecting real-time data reading + + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +=======================+==========================================================================================================================================================================================================================================================================+=======================+ + | COMPRESSION | The compression algorithm compresses blocks in HFiles. For compressible data, configure the compression algorithm to efficiently reduce disk I/Os and improve performance. | NONE | + | | | | + | | .. note:: | | + | | | | + | | Some data cannot be efficiently compressed. For example, a compressed figure can hardly be compressed again. The common compression algorithm is SNAPPY, because it has a high encoding/decoding speed and acceptable compression rate. | | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | BLOCKSIZE | Different block sizes affect HBase data read and write performance. You can configure sizes for blocks in an HFile. Larger blocks have a higher compression rate. However, they have poor performance in random data read, because HBase reads data in a unit of blocks. | 65536 | + | | | | + | | Set the parameter to 128 KB or 256 KB to improve data write efficiency without greatly affecting random read performance. The unit is byte. | | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | DATA_BLOCK_ENCODING | Encoding method of the block in an HFile. If a row contains multiple columns, set **FAST_DIFF** to save data storage space and improve performance. | NONE | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_real-time_data_write_performance.rst b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_real-time_data_write_performance.rst new file mode 100644 index 0000000..263ad51 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_real-time_data_write_performance.rst @@ -0,0 +1,119 @@ +:original_name: mrs_01_1017.html + +.. _mrs_01_1017: + +Improving Real-time Data Write Performance +========================================== + +Scenario +-------- + +Scenarios where data needs to be written to HBase in real time, or large-scale and consecutive put scenarios + +Prerequisites +------------- + +The HBase put or delete interface can be used to save data to HBase. + +Procedure +--------- + +- **Data writing server tuning** + + Parameter portal: + + Go to the **All Configurations** page of the HBase service. For details, see :ref:`Modifying Cluster Service Configuration Parameters `. + + .. table:: **Table 1** Configuration items that affect real-time data writing + + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Default Value | + +===============================================+================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | hbase.wal.hsync | Controls the synchronization degree when HLogs are written to the HDFS. If the value is **true**, HDFS returns only when data is written to the disk. If the value is **false**, HDFS returns when data is written to the OS cache. | true | + | | | | + | | Set the parameter to **false** to improve write performance. | | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.hfile.hsync | Controls the synchronization degree when HFiles are written to the HDFS. If the value is **true**, HDFS returns only when data is written to the disk. If the value is **false**, HDFS returns when data is written to the OS cache. | true | + | | | | + | | Set the parameter to **false** to improve write performance. | | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | GC_OPTS | You can increase HBase memory to improve HBase performance because read and write operations are performed in HBase memory. **HeapSize** and **NewSize** need to be adjusted. When you adjust **HeapSize**, set **Xms** and **Xmx** to the same value to avoid performance problems when JVM dynamically adjusts **HeapSize**. Set **NewSize** to 1/8 of **HeapSize**. | - HMaster | + | | | | + | | - **HMaster**: If HBase clusters enlarge and the number of Regions grows, properly increase the **GC_OPTS** parameter value of the HMaster. | -server -Xms4G -Xmx4G -XX:NewSize=512M -XX:MaxNewSize=512M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M | + | | - **RegionServer**: A RegionServer needs more memory than an HMaster. If sufficient memory is available, increase the **HeapSize** value. | | + | | | - Region Server | + | | .. note:: | | + | | | -server -Xms6G -Xmx6G -XX:NewSize=1024M -XX:MaxNewSize=1024M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M | + | | When the value of **HeapSize** for the active HMaster is 4 GB, the HBase cluster can support 100,000 regions. Empirically, each time 35,000 regions are added to the cluster, the value of **HeapSize** must be increased by 2 GB. It is recommended that the value of **HeapSize** for the active HMaster not exceed 32 GB. | | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.regionserver.handler.count | Indicates the number of RPC server instances started on RegionServer. If the parameter is set to an excessively large value, threads will compete fiercely. If the parameter is set to an excessively small value, requests will be waiting for a long time in RegionServer, reducing the processing capability. You can add threads based on resources. | 200 | + | | | | + | | It is recommended that the value be set to **100** to **300** based on the CPU usage. | | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.hregion.max.filesize | Indicates the maximum size of an HStoreFile, in bytes. If the size of any HStoreFile exceeds the value of this parameter, the managed Hregion is divided into two parts. | 10737418240 | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.hregion.memstore.flush.size | On the RegionServer, when the size of memstore that exists in memory of write operations exceeds **memstore.flush.size**, MemStoreFlusher performs the Flush operation to write the memstore to the corresponding store in the format of HFile. | 134217728 | + | | | | + | | If RegionServer memory is sufficient and active Regions are few, increase the parameter value and reduce compaction times to improve system performance. | | + | | | | + | | The Flush operation may be delayed after it takes place. Write operations continue and memstore keeps increasing during the delay. The maximum size of memstore is: **memstore.flush.size** x **hbase.hregion.memstore.block.multiplier**. When the memstore size exceeds the maximum value, write operations are blocked. Properly increasing the value of **hbase.hregion.memstore.block.multiplier** can reduce the blocks and make performance become more stable. Unit: byte | | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.regionserver.global.memstore.size | Updates the size of all MemStores supported by the RegionServer before locking or forcible flush. On the RegionServer, the MemStoreFlusher thread performs the flush. The thread regularly checks memory occupied by write operations. When the total memory volume occupied by write operations exceeds the threshold, MemStoreFlusher performs the flush. Larger memstore will be flushed first and then smaller ones until the occupied memory is less than the threshold. | 0.4 | + | | | | + | | Threshold = hbase.regionserver.global.memstore.size x hbase.regionserver.global.memstore.size.lower.limit x HBase_HEAPSIZE | | + | | | | + | | .. note:: | | + | | | | + | | The sum of the parameter value and the value of **hfile.block.cache.size** cannot exceed 0.8, that is, memory occupied by read and write operations cannot exceed 80% of **HeapSize**, ensuring stable running of other operations. | | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.hstore.blockingStoreFiles | Check whether the number of files is larger than the value of **hbase.hstore.blockingStoreFiles** before you flush regions. | 15 | + | | | | + | | If it is larger than the value of **hbase.hstore.blockingStoreFiles**, perform a compaction and configure **hbase.hstore.blockingWaitTime** to 90s to make the flush delay for 90s. During the delay, write operations continue and the memstore size keeps increasing and exceeds the threshold (**memstore.flush.size** x **hbase.hregion.memstore.block.multiplier**), blocking write operations. After compaction is complete, a large number of writes may be generated. As a result, the performance fluctuates sharply. | | + | | | | + | | Increase the value of **hbase.hstore.blockingStoreFiles** to reduce block possibilities. | | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.regionserver.thread.compaction.throttle | The compression whose size is greater than the value of this parameter is executed by the large thread pool. The unit is bytes. Indicates a threshold of a total file size for compaction during a Minor Compaction. The total file size affects execution duration of a compaction. If the total file size is large, other compactions or flushes may be blocked. | 1610612736 | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.hstore.compaction.min | Indicates the minimum number of HStoreFiles on which minor compaction is performed each time. When the size of a file in a Store exceeds the value of this parameter, the file is compacted. You can increase the value of this parameter to reduce the number of times that the file is compacted. If there are too many files in the Store, read performance will be affected. | 6 | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.hstore.compaction.max | Indicates the maximum number of HStoreFiles on which minor compaction is performed each time. The functions of the parameter and **hbase.hstore.compaction.max.size** are similar. Both are used to limit the execution duration of one compaction. | 10 | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.hstore.compaction.max.size | If the size of an HFile is larger than the parameter value, the HFile will not be compacted in a Minor Compaction but can be compacted in a Major Compaction. | 9223372036854775807 | + | | | | + | | The parameter is used to prevent HFiles of large sizes from being compacted. After a Major Compaction is forbidden, multiple HFiles can exist in a Store and will not be merged into one HFile, without affecting data access performance. The unit is byte. | | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase.hregion.majorcompaction | Main compression interval of all HStoreFile files in a region. The unit is milliseconds. Execution of Major Compactions consumes much system resources and will affect system performance during peak hours. | 604800000 | + | | | | + | | If service updates, deletion, and reclamation of expired data space are infrequent, set the parameter to **0** to disable Major Compactions. | | + | | | | + | | If you must perform a Major Compaction to reclaim more space, increase the parameter value and configure the **hbase.offpeak.end.hour** and **hbase.offpeak.start.hour** parameters to make the Major Compaction be triggered in off-peak hours. | | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | - hbase.regionserver.maxlogs | - Indicates the threshold for the number of HLog files that are not flushed on a RegionServer. If the number of HLog files is greater than the threshold, the RegionServer forcibly performs flush operations. | - 32 | + | - hbase.regionserver.hlog.blocksize | - Indicates the maximum size of an HLog file. If the size of an HLog file is greater than the value of this parameter, a new HLog file is generated. The old HLog file is disabled and archived. | - 134217728 | + | | | | + | | The two parameters determine the number of HLogs that are not flushed in a RegionServer. When the data volume is less than the total size of memstore, the flush operation is forcibly triggered due to excessive HLog files. In this case, you can adjust the values of the two parameters to avoid forcible flush. Unit: byte | | + +-----------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +- **Data writing client tuning** + + It is recommended that data is written in Put List mode if necessary, which greatly improves write performance. The length of each put list needs to be set based on the single put size and parameters of the actual environment. You are advised to do some basic tests before configuring parameters. + +- **Data table writing design optimization** + + .. table:: **Table 2** Parameters affecting real-time data writing + + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +=======================+==========================================================================================================================================================================================================================================================================+=======================+ + | COMPRESSION | The compression algorithm compresses blocks in HFiles. For compressible data, configure the compression algorithm to efficiently reduce disk I/Os and improve performance. | NONE | + | | | | + | | .. note:: | | + | | | | + | | Some data cannot be efficiently compressed. For example, a compressed figure can hardly be compressed again. The common compression algorithm is SNAPPY, because it has a high encoding/decoding speed and acceptable compression rate. | | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | BLOCKSIZE | Different block sizes affect HBase data read and write performance. You can configure sizes for blocks in an HFile. Larger blocks have a higher compression rate. However, they have poor performance in random data read, because HBase reads data in a unit of blocks. | 65536 | + | | | | + | | Set the parameter to 128 KB or 256 KB to improve data write efficiency without greatly affecting random read performance. The unit is byte. | | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | IN_MEMORY | Whether to cache table data in the memory first, which improves data read performance. If you will frequently access some small tables, set the parameter. | false | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_the_bulkload_efficiency.rst b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_the_bulkload_efficiency.rst new file mode 100644 index 0000000..5d6f066 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/improving_the_bulkload_efficiency.rst @@ -0,0 +1,35 @@ +:original_name: mrs_01_1636.html + +.. _mrs_01_1636: + +Improving the BulkLoad Efficiency +================================= + +Scenario +-------- + +BulkLoad uses MapReduce jobs to directly generate files that comply with the internal data format of HBase, and then loads the generated StoreFiles to a running cluster. Compared with HBase APIs, BulkLoad saves more CPU and network resources. + +ImportTSV is an HBase table data loading tool. + +Prerequisites +------------- + +When using BulkLoad, the output path of the file has been specified using the **Dimporttsv.bulk.output** parameter. + +Procedure +--------- + +Add the following parameter to the BulkLoad command when performing a batch loading task: + +.. table:: **Table 1** Parameter for improving BulkLoad efficiency + + +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------+ + | Parameter | Description | Value | + +==========================+=================================================================================================================================================================================================================================================================================================================================+=========================================================+ + | -Dimporttsv.mapper.class | The construction of key-value pairs is moved from the user-defined mapper to reducer to improve performance. The mapper only needs to send the original text in each row to the reducer. The reducer parses the record in each row and creates a key-value) pair. | org.apache.hadoop.hbase.mapreduce.TsvImporterByteMapper | + | | | | + | | .. note:: | and | + | | | | + | | When this parameter is set to **org.apache.hadoop.hbase.mapreduce.TsvImporterByteMapper**, this parameter is used only when the batch loading command without the *HBASE_CELL_VISIBILITY OR HBASE_CELL_TTL* option is executed. The **org.apache.hadoop.hbase.mapreduce.TsvImporterByteMapper** provides better performance. | org.apache.hadoop.hbase.mapreduce.TsvImporterTextMapper | + +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/index.rst b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/index.rst new file mode 100644 index 0000000..d8dad88 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/index.rst @@ -0,0 +1,24 @@ +:original_name: mrs_01_1013.html + +.. _mrs_01_1013: + +HBase Performance Tuning +======================== + +- :ref:`Improving the BulkLoad Efficiency ` +- :ref:`Improving Put Performance ` +- :ref:`Optimizing Put and Scan Performance ` +- :ref:`Improving Real-time Data Write Performance ` +- :ref:`Improving Real-time Data Read Performance ` +- :ref:`Optimizing JVM Parameters ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + improving_the_bulkload_efficiency + improving_put_performance + optimizing_put_and_scan_performance + improving_real-time_data_write_performance + improving_real-time_data_read_performance + optimizing_jvm_parameters diff --git a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_jvm_parameters.rst b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_jvm_parameters.rst new file mode 100644 index 0000000..247ac48 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_jvm_parameters.rst @@ -0,0 +1,46 @@ +:original_name: mrs_01_1019.html + +.. _mrs_01_1019: + +Optimizing JVM Parameters +========================= + +Scenario +-------- + +When the number of clusters reaches a certain scale, the default settings of the Java virtual machine (JVM) cannot meet the cluster requirements. In this case, the cluster performance deteriorates or the clusters may be unavailable. Therefore, JVM parameters must be properly configured based on actual service conditions to improve the cluster performance. + +Procedure +--------- + +**Navigation path for setting parameters:** + +The JVM parameters related to the HBase role must be configured in the **hbase-env.sh** file in the **${BIGDATA_HOME}/FusionInsight_HD_*/install/FusionInsight-HBase-2.2.3/hbase/conf/** directory. + +Each role has JVM parameter configuration variables, as shown in :ref:`Table 1 `. + +.. _mrs_01_1019__en-us_topic_0000001219350781_t2451c7af790c44cc8f895f6d4dc68b55: + +.. table:: **Table 1** HBase-related JVM parameter configuration variables + + +-------------------------+----------------------------------------------------------------+ + | Variable | Affected Role | + +=========================+================================================================+ + | HBASE_OPTS | All roles of HBase | + +-------------------------+----------------------------------------------------------------+ + | SERVER_GC_OPTS | All roles on the HBase server, such as Master and RegionServer | + +-------------------------+----------------------------------------------------------------+ + | CLIENT_GC_OPTS | Client process of HBase | + +-------------------------+----------------------------------------------------------------+ + | HBASE_MASTER_OPTS | Master of HBase | + +-------------------------+----------------------------------------------------------------+ + | HBASE_REGIONSERVER_OPTS | RegionServer of HBase | + +-------------------------+----------------------------------------------------------------+ + | HBASE_THRIFT_OPTS | Thrift of HBase | + +-------------------------+----------------------------------------------------------------+ + +**Configuration example:** + +.. code-block:: + + export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" diff --git a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_put_and_scan_performance.rst b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_put_and_scan_performance.rst new file mode 100644 index 0000000..3914d9e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_put_and_scan_performance.rst @@ -0,0 +1,87 @@ +:original_name: mrs_01_1016.html + +.. _mrs_01_1016: + +Optimizing Put and Scan Performance +=================================== + +Scenario +-------- + +HBase has many configuration parameters related to read and write performance. The configuration parameters need to be adjusted based on the read/write request loads. This section describes how to optimize read and write performance by modifying the RegionServer configurations. + +Procedure +--------- + +- JVM GC parameters + + Suggestions on setting the RegionServer **GC_OPTS** parameter: + + - Set **-Xms** and **-Xmx** to the same value based on your needs. Increasing the memory can improve the read and write performance. For details, see the description of **hfile.block.cache.size** in :ref:`Table 2 ` and **hbase.regionserver.global.memstore.size** in :ref:`Table 1 `. + - Set **-XX:NewSize** and **-XX:MaxNewSize** to the same value. You are advised to set the value to **512M** in low-load scenarios and **2048M** in high-load scenarios. + - Set **X-XX:CMSInitiatingOccupancyFraction** to be less than and equal to 90, and it is calculated as follows: **100 x (hfile.block.cache.size + hbase.regionserver.global.memstore.size + 0.05)**. + - **-XX:MaxDirectMemorySize** indicates the non-heap memory used by the JVM. You are advised to set this parameter to **512M** in low-load scenarios and **2048M** in high-load scenarios. + + .. note:: + + The **-XX:MaxDirectMemorySize** parameter is not used by default. If you need to set this parameter, add it to the **GC_OPTS** parameter. + +- Put parameters + + RegionServer processes the data of the put request and writes the data to memstore and HLog. + + - When the size of memstore reaches the value of **hbase.hregion.memstore.flush.size**, memstore is updated to HDFS to generate HFiles. + - Compaction is triggered when the number of HFiles in the column cluster of the current region reaches the value of **hbase.hstore.compaction.min**. + - If the number of HFiles in the column cluster of the current region reaches the value of **hbase.hstore.blockingStoreFiles**, the operation of refreshing the memstore and generating HFiles is blocked. As a result, the put request is blocked. + + .. _mrs_01_1016__en-us_topic_0000001219230513_t5194159aa9d34637bba4cdd0aa3b925e: + + .. table:: **Table 1** Put parameters + + +--------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +============================================+====================================================================================================================================================================================================================================================================================================================================================================================================+=======================+ + | hbase.wal.hsync | Indicates whether each WAL is persistent to disks. | true | + | | | | + | | For details, see :ref:`Improving Put Performance `. | | + +--------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | hbase.hfile.hsync | Indicates whether HFile write operations are persistent to disks. | true | + | | | | + | | For details, see :ref:`Improving Put Performance `. | | + +--------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | hbase.hregion.memstore.flush.size | If the size of MemStore (unit: Byte) exceeds a specified value, MemStore is flushed to the corresponding disk. The value of this parameter is checked by each thread running **hbase.server.thread.wakefrequency**. It is recommended that you set this parameter to an integer multiple of the HDFS block size. You can increase the value if the memory is sufficient and the put load is heavy. | 134217728 | + +--------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | hbase.regionserver.global.memstore.size | Updates the size of all MemStores supported by the RegionServer before locking or forcible flush. It is recommended that you set this parameter to **hbase.hregion.memstore.flush.size x Number of regions with active writes/RegionServer GC -Xmx**. The default value is **0.4**, indicating that 40% of RegionServer GC -Xmx is used. | 0.4 | + +--------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | hbase.hstore.flusher.count | Indicates the number of memstore flush threads. You can increase the parameter value in heavy-put-load scenarios. | 2 | + +--------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | hbase.regionserver.thread.compaction.small | Indicates the number of small compaction threads. You can increase the parameter value in heavy-put-load scenarios. | 10 | + +--------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | hbase.hstore.blockingStoreFiles | If the number of HStoreFile files in a Store exceeds the specified value, the update of the HRegion will be locked until a compression is completed or the value of **base.hstore.blockingWaitTime** is exceeded. Each time MemStore is flushed, a StoreFile file is written into MemStore. Set this parameter to a larger value in heavy-put-load scenarios. | 15 | + +--------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + +- Scan parameters + + .. _mrs_01_1016__en-us_topic_0000001219230513_tcd04a4cfd9f94a80a47de3ccb824175e: + + .. table:: **Table 2** Scan parameters + + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Default Value | + +=====================================+===============================================================================================================================================================================================================================================================================================================+=================================================================================================================+ + | hbase.client.scanner.timeout.period | Client and RegionServer parameters, indicating the lease timeout period of the client executing the scan operation. You are advised to set this parameter to an integer multiple of 60000 ms. You can set this parameter to a larger value when the read load is heavy. The unit is milliseconds. | 60000 | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+ + | hfile.block.cache.size | Indicates the data cache percentage in the RegionServer GC -Xmx. You can increase the parameter value in heavy-read-load scenarios, in order to improve cache hit ratio and performance. It indicates the percentage of the maximum heap (-Xmx setting) allocated to the block cache of HFiles or StoreFiles. | When offheap is disabled, the default value is **0.25**. When offheap is enabled, the default value is **0.1**. | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+ + +- Handler parameters + + .. table:: **Table 3** Handler parameters + + +--------------------------------------+------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Default Value | + +======================================+==============================================================================================================================+===============+ + | hbase.regionserver.handler.count | Indicates the number of RPC server instances on RegionServer. The recommended value ranges from 200 to 400. | 200 | + +--------------------------------------+------------------------------------------------------------------------------------------------------------------------------+---------------+ + | hbase.regionserver.metahandler.count | Indicates the number of program instances for processing prioritized requests. The recommended value ranges from 200 to 400. | 200 | + +--------------------------------------+------------------------------------------------------------------------------------------------------------------------------+---------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/index.rst b/doc/component-operation-guide-lts/source/using_hbase/index.rst new file mode 100644 index 0000000..72f2e2e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/index.rst @@ -0,0 +1,50 @@ +:original_name: mrs_01_0500.html + +.. _mrs_01_0500: + +Using HBase +=========== + +- :ref:`Using HBase from Scratch ` +- :ref:`Creating HBase Roles ` +- :ref:`Using an HBase Client ` +- :ref:`Configuring HBase Replication ` +- :ref:`Enabling Cross-Cluster Copy ` +- :ref:`Supporting Full-Text Index ` +- :ref:`Using the ReplicationSyncUp Tool ` +- :ref:`Configuring HBase DR ` +- :ref:`Performing an HBase DR Service Switchover ` +- :ref:`Configuring HBase Data Compression and Encoding ` +- :ref:`Performing an HBase DR Active/Standby Cluster Switchover ` +- :ref:`Community BulkLoad Tool ` +- :ref:`Configuring the MOB ` +- :ref:`Configuring Secure HBase Replication ` +- :ref:`Configuring Region In Transition Recovery Chore Service ` +- :ref:`Using a Secondary Index ` +- :ref:`HBase Log Overview ` +- :ref:`HBase Performance Tuning ` +- :ref:`Common Issues About HBase ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + using_hbase_from_scratch + creating_hbase_roles + using_an_hbase_client + configuring_hbase_replication + enabling_cross-cluster_copy + supporting_full-text_index + using_the_replicationsyncup_tool + configuring_hbase_dr + performing_an_hbase_dr_service_switchover + configuring_hbase_data_compression_and_encoding + performing_an_hbase_dr_active_standby_cluster_switchover + community_bulkload_tool + configuring_the_mob + configuring_secure_hbase_replication + configuring_region_in_transition_recovery_chore_service + using_a_secondary_index + hbase_log_overview + hbase_performance_tuning/index + common_issues_about_hbase/index diff --git a/doc/component-operation-guide-lts/source/using_hbase/performing_an_hbase_dr_active_standby_cluster_switchover.rst b/doc/component-operation-guide-lts/source/using_hbase/performing_an_hbase_dr_active_standby_cluster_switchover.rst new file mode 100644 index 0000000..3514fe3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/performing_an_hbase_dr_active_standby_cluster_switchover.rst @@ -0,0 +1,78 @@ +:original_name: mrs_01_1611.html + +.. _mrs_01_1611: + +Performing an HBase DR Active/Standby Cluster Switchover +======================================================== + +Scenario +-------- + +The HBase cluster in the current environment is a DR cluster. Due to some reasons, the active and standby clusters need to be switched over. That is, the standby cluster becomes the active cluster, and the active cluster becomes the standby cluster. + +Impact on the System +-------------------- + +After the active and standby clusters are switched over, data cannot be written to the original active cluster, and the original standby cluster becomes the active cluster to take over upper-layer services. + +Procedure +--------- + +**Ensuring that upper-layer services are stopped** + +#. Ensure that the upper-layer services have been stopped. If not, perform operations by referring to :ref:`Performing an HBase DR Service Switchover `. + +**Disabling the write function of the active cluster** + +2. Download and install the HBase client. + +3. On the HBase client of the standby cluster, run the following command as user **hbase** to disable the data write function of the standby cluster: + + **kinit hbase** + + **hbase shell** + + **set_clusterState_standby** + + The command is run successfully if the following information is displayed: + + .. code-block:: + + hbase(main):001:0> set_clusterState_standby + => true + +**Checking whether the active/standby synchronization is complete** + +4. Run the following command to ensure that the current data has been synchronized (SizeOfLogQueue=0 and SizeOfLogToReplicate=0 are required). If the values are not 0, wait and run the following command repeatedly until the values are 0. + + **status 'replication'** + +**Disabling synchronization between the active and standby clusters** + +5. Query all synchronization clusters and obtain the value of **PEER_ID**. + + **list_peers** + +6. Delete all synchronization clusters. + + **remove_peer** *'Standby cluster ID'* + + Example: + + **remove_peer** **'**\ *\ 1\ *\ **'** + +7. Query all synchronized tables. + + **list_replicated_tables** + +8. Disable all synchronized tables queried in the preceding step. + + **disable_table_replication** *'Table name'* + + Example: + + **disable_table_replication** *'t1'* + +**Performing an active/standby switchover** + +9. Reconfigure HBase DR. For details, see :ref:`Configuring HBase DR `. diff --git a/doc/component-operation-guide-lts/source/using_hbase/performing_an_hbase_dr_service_switchover.rst b/doc/component-operation-guide-lts/source/using_hbase/performing_an_hbase_dr_service_switchover.rst new file mode 100644 index 0000000..87b8c07 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/performing_an_hbase_dr_service_switchover.rst @@ -0,0 +1,80 @@ +:original_name: mrs_01_1610.html + +.. _mrs_01_1610: + +Performing an HBase DR Service Switchover +========================================= + +Scenario +-------- + +The system administrator can configure HBase cluster DR to improve system availability. If the active cluster in the DR environment is faulty and the connection to the HBase upper-layer application is affected, you need to configure the standby cluster information for the HBase upper-layer application so that the application can run in the standby cluster. + +Impact on the System +-------------------- + +After a service switchover, data written to the standby cluster is not synchronized to the active cluster by default. Add the active cluster is recovered, the data newly generated in the standby cluster needs to be synchronized to the active cluster by backup and recovery. If automatic data synchronization is required, you need to switch over the active and standby HBase DR clusters. + +Procedure +--------- + +#. Log in to FusionInsight Manager of the standby cluster. + +#. Download and install the HBase client. + +#. On the HBase client of the standby cluster, run the following command as user **hbase** to enable the data writing status in the standby cluster. + + **kinit hbase** + + **hbase shell** + + **set_clusterState_active** + + The command is run successfully if the following information is displayed: + + .. code-block:: + + hbase(main):001:0> set_clusterState_active + => true + +#. Check whether the original configuration files **hbase-site.xml**, **core-site.xml**, and **hdfs-site.xml** of the HBase upper-layer application are modified to adapt to the application running. + + - If yes, update the related content to the new configuration file and replace the old configuration file. + - If no, use the new configuration file to replace the original configuration file of the HBase upper-layer application. + +#. Configure the network connection between the host where the HBase upper-layer application is located and the standby cluster. + + .. note:: + + If the host where the client is installed is not a node in the cluster, configure network connections for the client to prevent errors when you run commands on the client. + + a. Ensure that the host where the client is installed can communicate with the hosts listed in the **hosts** file in the directory where the client installation package is decompressed. + b. If the host where the client is located is not a node in the cluster, you need to set the mapping between the host name and the IP address (service plan) in the /etc/hosts file on the host. The host names and IP addresses must be mapped one by one. + +#. Set the time of the host where the HBase upper-layer application is located to be the same as that of the standby cluster. The time difference must be less than 5 minutes. + +#. Check the authentication mode of the active cluster. + + - If the security mode is used, go to :ref:`8 `. + - If the normal mode is used, no further action is required. + +#. .. _mrs_01_1610__en-us_topic_0000001173470894_l5002f6a291d5455895e03939d56eae5c: + + Obtain the **keytab** and **krb5.conf** configuration files of the HBase upper-layer application user. + + a. On FusionInsight Manager of the standby cluster, choose **System** > **Permission** > **User**. + b. Locate the row that contains the target user, click **More** > **Download Authentication Credential** in the **Operation** column, and download the **keytab** file to the local PC. + c. Decompress the package to obtain **user.keytab** and **krb5.conf**. + +#. Use the **user.keytab** and **krb5.conf** files to replace the original files in the HBase upper-layer application. + +#. Stop upper-layer applications. + +#. Determine whether to switch over the active and standby HBase clusters. If the switchover is not performed, data will not be synchronized. + + - If yes, switch over the active and standby HBase DR clusters. For details, see :ref:`Performing an HBase DR Active/Standby Cluster Switchover `. Then, go to :ref:`12 `. + - If no, go to :ref:`12 `. + +#. .. _mrs_01_1610__en-us_topic_0000001173470894_li11189185214483: + + Start the upper-layer services. diff --git a/doc/component-operation-guide-lts/source/using_hbase/supporting_full-text_index.rst b/doc/component-operation-guide-lts/source/using_hbase/supporting_full-text_index.rst new file mode 100644 index 0000000..67e169f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/supporting_full-text_index.rst @@ -0,0 +1,117 @@ +:original_name: mrs_01_0493.html + +.. _mrs_01_0493: + +Supporting Full-Text Index +========================== + +You can create tables and indexes using **createTable** of **org.apache.luna.client.LunaAdmin** and specify table names, column family names, requests for creating indexes, as well as the directory path for storing mapping files. You can also add indexes to existing tables using **addCollection** and obtain tables to perform a scan operation by using **getTable** of **org.apache.luna.client.LunaAdmin**. + +The column name and column family name of a table consist of letters, digits, and underscores (_) but cannot contain any special characters. + +HBase tables with a full-text index have the following limitations: + +- Disaster recovery, backup, and restoration are not supported. +- Rows and column families cannot be deleted. +- Tight consistency is not supported by the query on Solr. + +Code Snippet Example +-------------------- + +The following code snippet belongs to the **testFullTextScan** method in the **LunaSample** class of the **hbase.examples** package. + +.. code-block:: + + public static void testFullTextScan() throws Exception { + /** + * Create create request of Solr. Specify collection name, confset name, + * number of shards, and number of replication factor. + */ + Create create = new Create(); + create.setCollectionName(COLLECTION_NAME); + create.setConfigName(CONFSET_NAME); + create.setNumShards(NUM_OF_SHARDS); + create.setReplicationFactor(NUM_OF_REPLICATIONFACTOR); + /** + * Create mapping. Specify index fields(mandatory) and non-index + * fields(optional). + */ + List indexedFields = new ArrayList(); + indexedFields.add(new ColumnField("name", "f:n")); + indexedFields.add(new ColumnField("cat", "f:t")); + indexedFields.add(new ColumnField("features", "f:d")); + Mapping mapping = new Mapping(indexedFields); + /** + * Create table descriptor of HBase. + */ + HTableDescriptor desc = new HTableDescriptor(HBASE_TABLE); + desc.addFamily(new HColumnDescriptor(TABLE_FAMILY)); + /** + * Create table and collection at the same time. + */ + LunaAdmin admin = null; + try { + admin = new AdminSingleton().getAdmin(); + admin.deleteTable(HBASE_TABLE); + if (!admin.tableExists(HBASE_TABLE)) { + admin.createTable(desc, Bytes.toByteArrays(new String[] { "0", "1", "2", "3", "4" }), + create, mapping); + } + /** + * Put data. + */ + Table table = admin.getTable(HBASE_TABLE); + int i = 0; + while (i < 5) { + byte[] row = Bytes.toBytes(i + "+sohrowkey"); + Put put = new Put(row); + put.addColumn(TABLE_FAMILY, Bytes.toBytes("n"), Bytes.toBytes("ZhangSan" + i)); + put.addColumn(TABLE_FAMILY, Bytes.toBytes("t"), Bytes.toBytes("CO" + i)); + put.addColumn(TABLE_FAMILY, Bytes.toBytes("d"), Bytes.toBytes("Male, Leader of M.O" + i)); + table.put(put); + i++; + } + + /** + * Scan table. + */ + Scan scan = new Scan(); + SolrQuery query = new SolrQuery(); + query.setQuery("name:ZhangSan1 AND cat:CO1"); + Filter filter = new FullTextFilter(query, COLLECTION_NAME); + scan.setFilter(filter); + ResultScanner scanner = table.getScanner(scan); + LOG.info("-----------------records----------------"); + for (Result r = scanner.next(); r != null; r = scanner.next()) { + for (Cell cell : r.rawCells()) { + LOG.info(Bytes.toString(CellUtil.cloneRow(cell)) + ":" + + Bytes.toString(CellUtil.cloneFamily(cell)) + "," + + Bytes.toString(CellUtil.cloneQualifier(cell)) + "," + + Bytes.toString(CellUtil.cloneValue(cell))); + } + } + LOG.info("-------------------end------------------"); + /** + * Delete collection. + */ + admin.deleteCollection(HBASE_TABLE, COLLECTION_NAME); + + /** + * Delete table. + */ + admin.deleteTable(HBASE_TABLE); + } catch (IOException e) { + e.printStackTrace(); + } finally { + /** + * When everything done, close LunaAdmin. + */ + admin.close(); + } + } + +Precautions +----------- + +- Tables and indexes to be created must be unique. +- Use LunaAdmin only to obtain tables to perform a scan operation. diff --git a/doc/component-operation-guide-lts/source/using_hbase/using_a_secondary_index.rst b/doc/component-operation-guide-lts/source/using_hbase/using_a_secondary_index.rst new file mode 100644 index 0000000..206ffa2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/using_a_secondary_index.rst @@ -0,0 +1,80 @@ +:original_name: mrs_01_1635.html + +.. _mrs_01_1635: + +Using a Secondary Index +======================= + +Scenario +-------- + +HIndex enables HBase indexing based on specific column values, making the retrieval of data highly efficient and fast. + +Constraints +----------- + +- Column families are separated by semicolons (;). + +- Columns and data types must be contained in square brackets ([]). + +- The column data type is specified by using -> after the column name. + +- If the column data type is not specified, the default data type (string) is used. + +- The number sign (#) is used to separate two index details. + +- The following is an optional parameter: + + -Dscan.caching: number of cached rows when the data table is scanned. + + The default value is set to 1000. + +- Indexes are created for a single region to repair damaged indexes. + + This function is not used to generate new indexes. + +Procedure +--------- + +#. Install the HBase client. For details, see :ref:`Using an HBase Client `. + +#. Go to the client installation directory, for example, **/opt/client**. + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If the cluster is in security mode, run the following command to authenticate the user. In normal mode, user authentication is not required. + + **kinit** *Component service user* + +#. Run the following command to access HIndex: + + **hbase org.apache.hadoop.hbase.hindex.mapreduce.TableIndexer** + + .. table:: **Table 1** Common HIndex commands + + +----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Description | Command | + +==================================+================================================================================================================================================================================+ + | Add Index | TableIndexer-Dtablename.to.index=table1-Dindexspecs.to.add='IDX1=>cf1:[q1->datatype],[q2],[q3];cf2:[q1->datatype],[q2->datatype]#IDX2=>cf1:[q5]' | + +----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Create Index | TableIndexer -Dtablename.to.index=table1 -Dindexnames.to.build='IDX1#IDX2' | + +----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Delete Index | TableIndexer -Dtablename.to.index=table1 -Dindexnames.to.drop='IDX1#IDX2' | + +----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Disable Index | TableIndexer -Dtablename.to.index=table1 -Dindexnames.to.disable='IDX1#IDX2' | + +----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Add and Create Index | TableIndexer -Dtablename.to.index=table1 -Dindexspecs.to.add='IDX1=>cf1:[q1->datatype],[q2],[q3];cf2:[q1->datatype],[q2->datatype]#IDX2=>cf1:[q5] -Dindexnames.to.build='IDX1' | + +----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Create Index for a Single Region | TableIndexer -Dtablename.to.index=table1 -Dregion.to.index=regionEncodedName -Dindexnames.to.build='IDX1#IDX2' | + +----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. note:: + + - **IDX1**: indicates the index name. + - **cf1**: indicates the column family name. + - **q1**: indicates the column name. + - **datatype**: indicates the data type, including String, Integer, Double, Float, Long, Short, Byte and Char. diff --git a/doc/component-operation-guide-lts/source/using_hbase/using_an_hbase_client.rst b/doc/component-operation-guide-lts/source/using_hbase/using_an_hbase_client.rst new file mode 100644 index 0000000..e323c41 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/using_an_hbase_client.rst @@ -0,0 +1,76 @@ +:original_name: mrs_01_24041.html + +.. _mrs_01_24041: + +Using an HBase Client +===================== + +Scenario +-------- + +This section describes how to use the HBase client in an O&M scenario or a service scenario. + +Prerequisites +------------- + +- The client has been installed. For example, the installation directory is **/opt/hadoopclient**. The client directory in the following operations is only an example. Change it to the actual installation directory. + +- Service component users are created by the administrator as required. + + A machine-machine user needs to download the **keytab** file and a human-machine user needs to change the password upon the first login. + +- If a non-**root** user uses the HBase client, ensure that the owner of the HBase client directory is this user. Otherwise, run the following command to change the owner. + + **chown user:group -R** *Client installation directory*\ **/HBase** + +Using the HBase Client +---------------------- + +#. Log in to the node where the client is installed as the client installation user. + +#. Run the following command to go to the client directory: + + **cd** **/opt/hadoopclient** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the current user. The current user must have the permission to create HBase tables. If Kerberos authentication is disabled for the current cluster, skip this step. + + **kinit** *Component service user* + + For example, **kinit hbaseuser**. + +#. Run the following HBase client command: + + **hbase shell** + +Common HBase client commands +---------------------------- + +The following table lists common HBase client commands. + +.. table:: **Table 1** HBase client commands + + +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Command | Description | + +==========+=================================================================================================================================================================================================================================+ + | create | Used to create a table, for example, **create 'test', 'f1', 'f2', 'f3'**. | + +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | disable | Used to disable a specified table, for example, **disable 'test'**. | + +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | enable | Used to enable a specified table, for example, **enable 'test'**. | + +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | alter | Used to alter the table structure. You can run the **alter** command to add, modify, or delete column family information and table-related parameter values, for example, **alter 'test', {NAME => 'f3', METHOD => 'delete'}**. | + +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | describe | Used to obtain the table description, for example, **describe 'test'**. | + +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | drop | Used to delete a specified table, for example, **drop 'test'**. Before deleting a table, you must stop it. | + +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | put | Used to write the value of a specified cell, for example, **put 'test','r1','f1:c1','myvalue1'**. The cell location is unique and determined by the table, row, and column. | + +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | get | Used to get the value of a row or the value of a specified cell in a row, for example, **get 'test','r1'**. | + +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | scan | Used to query table data, for example, **scan 'test'**. The table name and scanner must be specified in the command. | + +----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hbase/using_hbase_from_scratch.rst b/doc/component-operation-guide-lts/source/using_hbase/using_hbase_from_scratch.rst new file mode 100644 index 0000000..f1d03c7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/using_hbase_from_scratch.rst @@ -0,0 +1,113 @@ +:original_name: mrs_01_0368.html + +.. _mrs_01_0368: + +Using HBase from Scratch +======================== + +HBase is a column-based distributed storage system that features high reliability, performance, and scalability. This section describes how to use HBase from scratch, including how to update the client on the Master node in the cluster, create a table using the client, insert data in the table, modify the table, read data from the table, delete table data, and delete the table. + +Background +---------- + +Suppose a user develops an application to manage users who use service A in an enterprise. The procedure of operating service A on the HBase client is as follows: + +- Create the **user_info** table. +- Add users' educational backgrounds and titles to the table. +- Query user names and addresses by user ID. +- Query information by user name. +- Deregister users and delete user data from the user information table. +- Delete the user information table after service A ends. + +.. _mrs_01_0368__en-us_topic_0000001173789254_en-us_topic_0229422393_en-us_topic_0173178212_en-us_topic_0037446806_table27353390: + +.. table:: **Table 1** User information + + =========== ==== ====== === ======= + ID Name Gender Age Address + =========== ==== ====== === ======= + 12005000201 A Male 19 City A + 12005000202 B Female 23 City B + 12005000203 C Male 26 City C + 12005000204 D Male 18 City D + 12005000205 E Female 21 City E + 12005000206 F Male 32 City F + 12005000207 G Female 29 City G + 12005000208 H Female 30 City H + 12005000209 I Male 26 City I + 12005000210 J Male 25 City J + =========== ==== ====== === ======= + +Prerequisites +------------- + +The client has been installed. For example, the client is installed in the **/opt/client** directory. The client directory in the following operations is only an example. Change it to the actual installation directory. Before using the client, download and update the client configuration file, and ensure that the active management node of Manager is available. + +Procedure +--------- + +#. Use the client on the active management node. + + a. Log in to the node where the client is installed as the client installation user and run the following command to switch to the client directory: + + **cd /opt/client** + + b. Run the following command to configure environment variables: + + **source bigdata_env** + + c. If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the current user. The current user must have the permission to create HBase tables. If Kerberos authentication is disabled for the current cluster, skip this step. + + **kinit** *MRS cluster user* + + For example, **kinit hbaseuser**. + + d. Run the following HBase client command: + + **hbase shell** + +#. Run the following commands on the HBase client to implement service A. + + a. Create the **user_info** user information table according to :ref:`Table 1 ` and add data to it. + + **create** '*user_info*',{**NAME** => 'i'} + + For example, to add information about the user whose ID is **12005000201**, run the following commands: + + **put** '*user_info*','*12005000201*','**i:name**','*A*' + + **put** '*user_info*','*12005000201*','**i:gender**','*Male*' + + **put** '*user_info*','*12005000201*','**i:age**','*19*' + + **put** '*user_info*','*12005000201*','**i:address**','*City A*' + + b. Add users' educational backgrounds and titles to the **user_info** table. + + For example, to add educational background and title information about user 12005000201, run the following commands: + + **put** '*user_info*','*12005000201*','**i:degree**','*master*' + + **put** '*user_info*','*12005000201*','**i:pose**','*manager*' + + c. Query user names and addresses by user ID. + + For example, to query the name and address of user 12005000201, run the following command: + + **scan**'*user_info*',{**STARTROW**\ =>'*12005000201*',\ **STOPROW**\ =>'*12005000201*',\ **COLUMNS**\ =>['**i:name**','**i:address**']} + + d. Query information by user name. + + For example, to query information about user A, run the following command: + + **scan**'*user_info*',{**FILTER**\ =>"SingleColumnValueFilter('i','name',=,'binary:*A*')"} + + e. Delete user data from the user information table. + + All user data needs to be deleted. For example, to delete data of user 12005000201, run the following command: + + **delete**'*user_info*','*12005000201*','i' + + f. Delete the user information table. + + **disable**'*user_info*';\ **drop** '*user_info*' diff --git a/doc/component-operation-guide-lts/source/using_hbase/using_the_replicationsyncup_tool.rst b/doc/component-operation-guide-lts/source/using_hbase/using_the_replicationsyncup_tool.rst new file mode 100644 index 0000000..e072471 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/using_the_replicationsyncup_tool.rst @@ -0,0 +1,56 @@ +:original_name: mrs_01_0510.html + +.. _mrs_01_0510: + +Using the ReplicationSyncUp Tool +================================ + +Prerequisites +------------- + +#. Active and standby clusters have been installed and started. +#. Time is consistent between the active and standby clusters and the NTP service on the active and standby clusters uses the same time source. +#. When the HBase service of the active cluster is stopped, the ZooKeeper and HDFS services must be started and run. +#. ReplicationSyncUp must be run by the system user who starts the HBase process. +#. In security mode, ensure that the HBase system user of the standby cluster has the read permission on HDFS of the active cluster. This is because that it will update the ZooKeeper nodes and HDFS files of the HBase system. +#. When HBase of the active cluster is faulty, the ZooKeeper, file system, and network of the active cluster are still available. + +Scenarios +--------- + +The replication mechanism can use WAL to synchronize the state of a cluster with the state of another cluster. After HBase replication is enabled, if the active cluster is faulty, ReplicationSyncUp synchronizes incremental data from the active cluster to the standby cluster using the information from the ZooKeeper node. After data synchronization is complete, the standby cluster can be used as an active cluster. + +Parameter Configuration +----------------------- + ++------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ +| Parameter | Description | Default Value | ++====================================+=========================================================================================================================================================================================================+===============+ +| hbase.replication.bulkload.enabled | Whether to enable the bulkload data replication function. The parameter value type is Boolean. To enable the bulkload data replication function, set this parameter to **true** for the active cluster. | **false** | ++------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ +| hbase.replication.cluster.id | ID of the source HBase cluster. After the bulkload data replication is enabled, this parameter is mandatory and must be defined in the source cluster. The parameter value type is String. | ``-`` | ++------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + +Tool Usage +---------- + +Run the following command on the client of the active cluster: + +**hbase org.apache.hadoop.hbase.replication.regionserver.ReplicationSyncUp -Dreplication.sleep.before.failover=1** + +.. note:: + + **replication.sleep.before.failover** indicates sleep time required for replication of the remaining data when RegionServer fails to start. You are advised to set this parameter to 1 second to quickly trigger replication. + +Precautions +----------- + +#. When the active cluster is stopped, this tool obtains the WAL processing progress and WAL processing queue from the ZooKeeper Node (RS znode) and copies the queues that are not copied to the standby cluster. +#. RegionServer of each active cluster has its own znode under the replication node of ZooKeeper in the standby cluster. It contains one znode of each peer cluster. +#. If RegionServer is faulty, each RegionServer in the active cluster receives a notification through the watcher and attempts to lock the znode of the faulty RegionServer, including its queues. The successfully created RegionServer transfers all queues to the znode of its own queue. After queues are transferred, they are deleted from the old location. +#. When the active cluster is stopped, ReplicationSyncUp synchronizes data between active and standby clusters using the information from the ZooKeeper node. In addition, WALs of the RegionServer znode will be moved to the standby cluster. + +Restrictions and Limitations +---------------------------- + +If the standby cluster is stopped or the peer relationship is closed, the tool runs normally but the peer relationship cannot be replicated. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/balancing_datanode_capacity.rst b/doc/component-operation-guide-lts/source/using_hdfs/balancing_datanode_capacity.rst new file mode 100644 index 0000000..48344a3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/balancing_datanode_capacity.rst @@ -0,0 +1,188 @@ +:original_name: mrs_01_1667.html + +.. _mrs_01_1667: + +Balancing DataNode Capacity +=========================== + +Scenario +-------- + +In the HDFS cluster, unbalanced disk usage among DataNodes may occur, for example, when new DataNodes are added to the cluster. Unbalanced disk usage may result in multiple problems. For example, MapReduce applications cannot make full use of local computing advantages, network bandwidth usage between data nodes cannot be optimal, or node disks cannot be used. Therefore, the system administrator needs to periodically check and maintain DataNode data balance. + +HDFS provides a capacity balancing program Balancer. By running Balancer, you can balance the HDFS cluster and ensure that the difference between the disk usage of each DataNode and that of the HDFS cluster does not exceed the threshold. DataNode disk usage before and after balancing is shown in :ref:`Figure 1 ` and :ref:`Figure 2 `, respectively. + +.. _mrs_01_1667__en-us_topic_0000001219231321_ff269b77c9222460985503fffcebf980e: + +.. figure:: /_static/images/en-us_image_0000001295899880.png + :alt: **Figure 1** DataNode disk usage before balancing + + **Figure 1** DataNode disk usage before balancing + +.. _mrs_01_1667__en-us_topic_0000001219231321_fee19cefb9d104f238448abdcf62f1e49: + +.. figure:: /_static/images/en-us_image_0000001295739916.png + :alt: **Figure 2** DataNode disk usage after balancing + + **Figure 2** DataNode disk usage after balancing + +The time of the balancing operation is affected by the following two factors: + +#. Total amount of data to be migrated: + + The data volume of each DataNode must be greater than (Average usage - Threshold) x Average data volume and less than (Average usage + Threshold) x Average data volume. If the actual data volume is less than the minimum value or greater than the maximum value, imbalance occurs. The system sets the largest deviation volume on all DataNodes as the total data volume to be migrated. + +#. Balancer migration is performed in sequence in iteration mode. The amount of data to be migrated in each iteration does not exceed 10 GB, and the usage of each iteration is recalculated. + +Therefore, for a cluster, you can estimate the time consumed by each iteration (by observing the time consumed by each iteration recorded in balancer logs) and divide the total data volume by 10 GB to estimate the task execution time. + +The balancer can be started or stopped at any time. + +Impact on the System +-------------------- + +- The balance operation occupies network bandwidth resources of DataNodes. Perform the operation during maintenance based on service requirements. +- The balance operation may affect the running services if the bandwidth traffic (the default bandwidth control is 20 MB/s) is reset or the data volume is increased. + +Prerequisites +------------- + +The client has been installed. + +Procedure +--------- + +#. Log in to the node where the client is installed as a client installation user. Run the following command to switch to the client installation directory, for example, **/opt/client**: + + **cd /opt/client** + + .. note:: + + If the cluster is in normal mode, run the **su - omm** command to switch to user **omm**. + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If the cluster is in security mode, run the following command to authenticate the HDFS identity: + + **kinit hdfs** + +#. Determine whether to adjust the bandwidth control. + + - If yes, go to :ref:`5 `. + - If no, go to :ref:`6 `. + +#. .. _mrs_01_1667__en-us_topic_0000001219231321_l91d088e58d8d4bbbb720317d843ca5d3: + + Run the following command to change the maximum bandwidth of Balancer, and then go to :ref:`6 `. + + **hdfs dfsadmin -setBalancerBandwidth ** + + ** indicates the bandwidth control value, in bytes. For example, to set the bandwidth control to 20 MB/s (the corresponding value is 20971520), run the following command: + + **hdfs dfsadmin -setBalancerBandwidth** **20971520** + + .. note:: + + - The default bandwidth control is 20 MB/s. This value is applicable to the scenario where the current cluster uses the 10GE network and services are being executed. If the service idle time window is insufficient for balance maintenance, you can increase the value of this parameter to shorten the balance time, for example, to 209715200 (200 MB/s). + - The value of this parameter depends on the networking. If the cluster load is high, you can change the value to 209715200 (200 MB/s). If the cluster is idle, you can change the value to 1073741824 (1 GB/s). + - If the bandwidth of the DataNodes cannot reach the specified maximum bandwidth, modify the HDFS parameter **dfs.datanode.balance.max.concurrent.moves** on FusionInsight Manager, and change the number of threads for balancing on each DataNode to **32** and restart the HDFS service. + +#. .. _mrs_01_1667__en-us_topic_0000001219231321_ld8ed77b8b7c745308eea6a68de2f4233: + + Run the following command to start the balance task: + + **bash /opt/client/HDFS/hadoop/sbin/start-balancer.sh -threshold ** + + **-threshold** specifies the deviation value of the DataNode disk usage, which is used for determining whether the HDFS data is balanced. When the difference between the disk usage of each DataNode and the average disk usage of the entire HDFS cluster is less than this threshold, the system considers that the HDFS cluster has been balanced and ends the balance task. + + For example, to set deviation rate to 5%, run the following command: + + **bash /opt/client/HDFS/hadoop/sbin/start-balancer.sh -threshold 5** + + .. note:: + + - The preceding command executes the task in the background. You can query related logs in the **hadoop-root-balancer-**\ *host name*\ **.out log** file in the **/opt/client/HDFS/hadoop/logs** directory of the host. + + - To stop the balance task, run the following command: + + **bash /opt/client/HDFS/hadoop/sbin/stop-balancer.sh** + + - If only data on some nodes needs to be balanced, you can add the **-include** parameter in the script to specify the nodes to be migrated. You can run commands to view the usage of different parameters. + + - **/opt/client** is the client installation directory. If the directory is inconsistent, replace it. + + - If the command fails to be executed and the error information **Failed to APPEND_FILE /system/balancer.id** is displayed in the log, run the following command to forcibly delete **/system/balancer.id** and run the **start-balancer.sh** script again: + + **hdfs dfs -rm -f /system/balancer.id** + +#. If the following information is displayed, the balancing is complete and the system automatically exits the task: + + .. code-block:: + + Apr 01, 2016 01:01:01 PM Balancing took 23.3333 minutes + + After you run the script in :ref:`6 `, the **hadoop-root-balancer-**\ *Host name*\ **.out log** file is generated in the client installation directory **/opt/client/HDFS/hadoop/logs**. You can view the following information in the log: + + - Time Stamp + - Bytes Already Moved + - Bytes Left To Move + - Bytes Being Moved + +Related Tasks +------------- + +**Enable automatic execution of the balance task** + +#. Log in to FusionInsight Manager. + +#. Choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **Configurations**, select **All Configurations**, search for the following parameters, and change the parameter values. + + - **dfs.balancer.auto.enable** indicates whether to enable automatic balance task execution. The default value **false** indicates that automatic balance task execution is disabled. The value **true** indicates that automatic execution is enabled. + + - **dfs.balancer.auto.cron.expression** indicates the task execution time. The default value **0 1 \* \* 6** indicates that the task is executed at 01:00 every Saturday. This parameter is valid only when the automatic execution is enabled. + + :ref:`Table 1 ` describes the expression for modifying this parameter. **\*** indicates consecutive time segments. + + .. _mrs_01_1667__en-us_topic_0000001219231321_t3d64fdb3254a42beaed3c5e4c7087501: + + .. table:: **Table 1** Parameters in the execution expression + + ====== =========================================================== + Column Description + ====== =========================================================== + 1 Minute. The value ranges from 0 to 59. + 2 Hour. The value ranges from 0 to 23. + 3 Date. The value ranges from 1 to 31. + 4 Month. The value ranges from 1 to 12. + 5 Week. The value ranges from 0 to 6. **0** indicates Sunday. + ====== =========================================================== + + - **dfs.balancer.auto.stop.cron.expression** indicates the task ending time. The default value is empty, indicating that the running balance task is not automatically stopped. For example, **0 5 \* \* 6** indicates that the balance task is stopped at 05:00 every Saturday. This parameter is valid only when the automatic execution is enabled. + + :ref:`Table 1 ` describes the expression for modifying this parameter. **\*** indicates consecutive time segments. + +#. Running parameters of the balance task that is automatically executed are shown in :ref:`Table 2 `. + + .. _mrs_01_1667__en-us_topic_0000001219231321_tc3bff391b0c14479916d9097f5e28238: + + .. table:: **Table 2** Running parameters of the automatic balancer + + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + | Parameter | Parameter description | Default Value | + +=====================================+===========================================================================================================================================================================================================================================================================================================================================================================+=====================================+ + | dfs.balancer.auto.threshold | Specifies the balancing threshold of the disk capacity percentage. This parameter is valid only when **dfs.balancer.auto.enable** is set to **true**. | 10 | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + | dfs.balancer.auto.exclude.datanodes | Specifies the list of DataNodes on which automatic disk balancing is not required. This parameter is valid only when **dfs.balancer.auto.enable** is set to **true**. | The value is left blank by default. | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + | dfs.balancer.auto.bandwidthPerSec | Specifies the maximum bandwidth (MB/s) of each DataNode for load balancing. | 20 | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + | dfs.balancer.auto.maxIdleIterations | Specifies the maximum number of consecutive idle iterations of Balancer. An idle iteration is an iteration without moving blocks. When the number of consecutive idle iterations reaches the maximum number, the balance task ends. The value **-1** indicates infinity. | 5 | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + | dfs.balancer.auto.maxDataNodesNum | Controls the number of DataNodes that perform automatic balance tasks. Assume that the value of this parameter is *N*. If *N* is greater than 0, data is balanced between *N* DataNodes with the highest percentage of remaining space and *N* DataNodes with the lowest percentage of remaining space. If *N* is 0, data is balanced among all DataNodes in the cluster. | 5 | + +-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------+ + +#. Click **Save** to make configurations take effect. You do not need to restart the HDFS service. + + Go to the **/var/log/Bigdata/hdfs/nn/hadoop-omm-balancer-**\ *Host name*\ **.log** file to view the task execution logs saved in the active NameNode. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/changing_the_datanode_storage_directory.rst b/doc/component-operation-guide-lts/source/using_hdfs/changing_the_datanode_storage_directory.rst new file mode 100644 index 0000000..7ec1f36 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/changing_the_datanode_storage_directory.rst @@ -0,0 +1,171 @@ +:original_name: mrs_01_1664.html + +.. _mrs_01_1664: + +Changing the DataNode Storage Directory +======================================= + +Scenario +-------- + +If the storage directory defined by the HDFS DataNode is incorrect or the HDFS storage plan changes, the system administrator needs to modify the DataNode storage directory on FusionInsight Manager to ensure that the HDFS works properly. Changing the ZooKeeper storage directory includes the following scenarios: + +- Change the storage directory of the DataNode role. In this way, the storage directories of all DataNode instances are changed. +- Change the storage directory of a single DataNode instance. In this way, only the storage directory of this instance is changed, and the storage directories of other instances remain the same. + +Impact on the System +-------------------- + +- The HDFS service needs to be stopped and restarted during the process of changing the storage directory of the DataNode role, and the cluster cannot provide services before it is completely started. + +- The DataNode instance needs to stopped and restarted during the process of changing the storage directory of the instance, and the instance at this node cannot provide services before it is started. +- The directory for storing service parameter configurations must also be updated. + +Prerequisites +------------- + +- New disks have been prepared and installed on each data node, and the disks are formatted. + +- New directories have been planned for storing data in the original directories. +- The HDFS client has been installed. +- The system administrator user **hdfs** is available. +- When changing the storage directory of a single DataNode instance, ensure that the number of active DataNode instances is greater than the value of **dfs.replication**. + +Procedure +--------- + +**Check the environment.** + +#. Log in to the server where the HDFS client is installed as user **root**, and run the following command to configure environment variables: + + **source** *Installation directory of the HDFS client*\ **/bigdata_env** + +#. If the cluster is in security mode, run the following command to authenticate the user: + + **kinit hdfs** + +#. Run the following command on the HDFS client to check whether all directories and files in the HDFS root directory are normal: + + **hdfs fsck /** + + Check the fsck command output. + + - If the following information is displayed, no file is lost or damaged. Go to :ref:`4 `. + + .. code-block:: + + The filesystem under path '/' is HEALTHY + + - If other information is displayed, some files are lost or damaged. Go to :ref:`5 `. + +#. .. _mrs_01_1664__en-us_topic_0000001219350541_le587d508c49b4837bcabd9bd9cf98bc4: + + Log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services**, and check whether **Running Status** of HDFS is **Normal**. + + - If yes, go to :ref:`6 `. + - If no, the HDFS status is unhealthy. Go to :ref:`5 `. + +#. .. _mrs_01_1664__en-us_topic_0000001219350541_l1ce08f0a7d2349b487dd6f19c38c7273: + + Rectify the HDFS fault.. The task is complete. + +#. .. _mrs_01_1664__en-us_topic_0000001219350541_lff55f0ef8699449ab4cfc4eddeed1711: + + Determine whether to change the storage directory of the DataNode role or that of a single DataNode instance: + + - To change the storage directory of the DataNode role, go to :ref:`7 `. + - To change the storage directory of a single DataNode instance, go to :ref:`12 `. + +**Changing the storage directory of the DataNode role** + +7. .. _mrs_01_1664__en-us_topic_0000001219350541_l4bc534684e1d4d3cb656e4ed55bb75af: + + Choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **Stop Instance** to stop the HDFS service. + +8. Log in to each data node where the HDFS service is installed as user **root** and perform the following operations: + + a. Create a target directory (**data1** and **data2** are original directories in the cluster). + + For example, to create a target directory **${BIGDATA_DATA_HOME}/hadoop/data3/dn**, run the following commands: + + **mkdir** **${BIGDATA_DATA_HOME}/hadoop/data3** and **mkdir ${BIGDATA_DATA_HOME}/hadoop/data3/dn** + + b. Mount the target directory to the new disk. For example, mount **${BIGDATA_DATA_HOME}/hadoop/data3** to the new disk. + + c. Modify permissions on the new directory. + + For example, to create a target directory **${BIGDATA_DATA_HOME}/hadoop/data3/dn**, run the following commands: + + **chmod 700** **${BIGDATA_DATA_HOME}/hadoop/data3/dn -R** and **chown omm:wheel** **${BIGDATA_DATA_HOME}/hadoop/data3/dn -R** + + d. .. _mrs_01_1664__en-us_topic_0000001219350541_l63f4856203e9425f9a23113c3d13f665: + + Copy the data to the target directory. + + For example, if the old directory is **${BIGDATA_DATA_HOME}/hadoop/data1/dn** and the target directory is **${BIGDATA_DATA_HOME}/hadoop/data3/dn**, run the following command: + + **cp -af** **${BIGDATA_DATA_HOME}/hadoop/data1/dn/\*** **${BIGDATA_DATA_HOME}/hadoop/data3/dn** + +9. On FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **Configurations** > **All Configurations** to go to the HDFS service configuration page. + + Change the value of **dfs.datanode.data.dir** from the default value **%{@auto.detect.datapart.dn}** to the new target directory, for example, **${BIGDATA_DATA_HOME}/hadoop/data3/dn**. + + For example, the original data storage directories are **/srv/BigData/hadoop/data1**, **/srv/BigData/hadoop/data2**. To migrate data from the **/srv/BigData/hadoop/data1** directory to the newly created **/srv/BigData/hadoop/data3** directory, replace the whole parameter with **/srv/BigData/hadoop/data2, /srv/BigData/hadoop/data3**. Separate multiple storage directories with commas (,). In this example, changed directories are **/srv/BigData/hadoop/data2**, **/srv/BigData/hadoop/data3**. + +10. Click **Save**. Choose **Cluster** > *Name of the desired cluster* > **Services**. On the page that is displayed, start the services that have been stopped. + +11. After the HDFS is started, run the following command on the HDFS client to check whether all directories and files in the HDFS root directory are correctly copied: + + **hdfs fsck /** + + Check the fsck command output. + + - If the following information is displayed, no file is lost or damaged, and data replication is successful. No further action is required. + + .. code-block:: + + The filesystem under path '/' is HEALTHY + + - If other information is displayed, some files are lost or damaged. In this case, check whether :ref:`8.d ` is correct and run the **hdfs fsck** *Name of the damaged file* **-delete** command. + +**Changing the storage directory of a single DataNode instance** + +12. .. _mrs_01_1664__en-us_topic_0000001219350541_lab34cabb4d324166acebeb18e1098884: + + Choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **Instance**. Select the HDFS instance whose storage directory needs to be modified, and choose **More** > **Stop Instance**. + +13. Log in to the DataNode node as user **root**, and perform the following operations: + + a. Create a target directory. + + For example, to create a target directory **${BIGDATA_DATA_HOME}/hadoop/data3/dn**, run the following commands: + + **mkdir** **${BIGDATA_DATA_HOME}/hadoop/data3** and **mkdir ${BIGDATA_DATA_HOME}/hadoop/data3/dn** + + b. Mount the target directory to the new disk. + + For example, mount **${BIGDATA_DATA_HOME}/hadoop/data3** to the new disk. + + c. Modify permissions on the new directory. + + For example, to create a target directory **${BIGDATA_DATA_HOME}/hadoop/data3/dn**, run the following commands: + + **chmod 700** **${BIGDATA_DATA_HOME}/hadoop/data3/dn -R** and **chown omm:wheel** **${BIGDATA_DATA_HOME}/hadoop/data3/dn -R** + + d. Copy the data to the target directory. + + For example, if the old directory is **${BIGDATA_DATA_HOME}/hadoop/data1/dn** and the target directory is **${BIGDATA_DATA_HOME}/hadoop/data3/dn**, run the following command: + + **cp -af** **${BIGDATA_DATA_HOME}/hadoop/data1/dn/\*** **${BIGDATA_DATA_HOME}/hadoop/data3/dn** + +14. On FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Service** > **HDFS** > **Instance**. Click the specified DataNode instance and go to the **Configurations** page. + + Change the value of **dfs.datanode.data.dir** from the default value **%{@auto.detect.datapart.dn}** to the new target directory, for example, **${BIGDATA_DATA_HOME}/hadoop/data3/dn**. + + For example, the original data storage directories are **/srv/BigData/hadoop/data1,/srv/BigData/hadoop/data2**. To migrate data from the **/srv/BigData/hadoop/data1** directory to the newly created **/srv/BigData/hadoop/data3** directory, replace the whole parameter with **/srv/BigData/hadoop/data2,/srv/BigData/hadoop/data3**. + +15. Click **Save**, and then click **OK**. + + **Operation succeeded** is displayed. click **Finish**. + +16. Choose **More** > **Restart Instance** to restart the DataNode instance. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_encrypted_channels.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_encrypted_channels.rst new file mode 100644 index 0000000..f755cdb --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_encrypted_channels.rst @@ -0,0 +1,49 @@ +:original_name: mrs_01_0810.html + +.. _mrs_01_0810: + +Configuring Encrypted Channels +============================== + +Scenario +-------- + +Encrypted channel is an encryption protocol of remote procedure call (RPC) in HDFS. When a user invokes RPC, the user's login name will be transmitted to RPC through RPC head. Then RPC uses Simple Authentication and Security Layer (SASL) to determine an authorization protocol (Kerberos and DIGEST-MD5) to complete RPC authorization. When users deploy security clusters, they need to use encrypted channels and configure the following parameters. For details about the secure Hadoop RPC, visit https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_RPC. + +Configuration Description +------------------------- + +Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** Parameter description + + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ + | Parameter | Description | Default Value | + +=======================+==========================================================================================================================================================================================================================+================================+ + | hadoop.rpc.protection | .. important:: | - Security mode: privacy | + | | | - Normal mode: authentication | + | | NOTICE: | | + | | | | + | | - The setting takes effect only after the service is restarted. Rolling restart is not supported. | | + | | - After the setting, you need to download the client configuration again. Otherwise, the HDFS cannot provide the read and write services. | | + | | | | + | | Whether the RPC channels of each module in Hadoop are encrypted. The channels include: | | + | | | | + | | - RPC channels for clients to access HDFS | | + | | - RPC channels between modules in HDFS, for example, RPC channels between DataNode and NameNode | | + | | - RPC channels for clients to access Yarn | | + | | - RPC channels between NodeManager and ResourceManager | | + | | - RPC channels for Spark to access Yarn and HDFS | | + | | - RPC channels for MapReduce to access Yarn and HDFS | | + | | - RPC channels for HBase to access HDFS | | + | | | | + | | .. note:: | | + | | | | + | | You can set this parameter on the HDFS component configuration page. The parameter setting takes effect globally, that is, the setting of whether the RPC channel is encrypted takes effect on all modules in Hadoop. | | + | | | | + | | There are three encryption modes. | | + | | | | + | | - **authentication**: This is the default value in normal mode. In this mode, data is directly transmitted without encryption after being authenticated. This mode ensures performance but has security risks. | | + | | - **integrity**: Data is transmitted without encryption or authentication. To ensure data security, exercise caution when using this mode. | | + | | - **privacy**: This is the default value in security mode, indicating that data is transmitted after authentication and encryption. This mode reduces the performance. | | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_hdfs_directory_permission.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_hdfs_directory_permission.rst new file mode 100644 index 0000000..324b156 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_hdfs_directory_permission.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_0797.html + +.. _mrs_01_0797: + +Configuring HDFS Directory Permission +===================================== + +Scenario +-------- + +The permission for some HDFS directories is **777** or **750** by default, which brings potential security risks. You are advised to modify the permission for the HDFS directories after the HDFS is installed to increase user security. + +Procedure +--------- + +Log in to the HDFS client as the administrator and run the following command to modify the permission for the **/user** directory. + +The permission is set to **1777**, that is, **1** is added to the original permission. This indicates that only the user who creates the directory can delete it. + +**hdfs dfs -chmod 1777** */user* + +To ensure security of the system file, you are advised to harden the security for non-temporary directories. The following directories are examples: + +- /user:777 +- /mr-history:777 +- /mr-history/tmp:777 +- /mr-history/done:777 +- /user/mapred:755 diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_hdfs_diskbalancer.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_hdfs_diskbalancer.rst new file mode 100644 index 0000000..c2190fd --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_hdfs_diskbalancer.rst @@ -0,0 +1,91 @@ +:original_name: mrs_01_1678.html + +.. _mrs_01_1678: + +Configuring HDFS DiskBalancer +============================= + +Scenario +-------- + +DiskBalancer is an online disk balancer that balances disk data on running DataNodes based on various indicators. It works in the similar way of the HDFS Balancer. The difference is that HDFS Balancer balances data between DataNodes, while HDFS DiskBalancer balances data among disks on a single DataNode. + +Data among disks may be unevenly distributed if a large number of files have been deleted from a cluster running for a long time, or disk capacity expansion is performed on a node in the cluster. Uneven data distribution may deteriorate the concurrent read/write performance of the HDFS, or cause service failure due to inappropriate HDFS write policies. In this case, the data density among disks on a node needs to be balanced to prevent heterogeneous small disks from becoming the performance bottleneck of the node. + +Configuration Description +------------------------- + +Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** Parameter description + + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Default Value | + +=================================================+========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+===============+ + | dfs.disk.balancer.auto.enabled | Indicates whether to enable the HDFS DiskBalancer function. The default value is **false**, indicating that this function is disabled. | false | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | dfs.disk.balancer.auto.cron.expression | CRON expression of the HDFS disk balancing operation, which is used to control the start time of the balancing operation. This parameter is valid only when **dfs.disk.balancer.auto.enabled** is set to **true**. The default value is **0 1 \* \* 6**, indicating that tasks are executed at 01:00 every Saturday. For details about cron expression, see :ref:`Table 2 `. The default value indicates that the DiskBalancer check is executed at 01:00 every Saturday. | 0 1 \* \* 6 | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | dfs.disk.balancer.max.disk.throughputInMBperSec | Specifies the maximum disk bandwidth that can be used for disk data balancing. The unit is MB/s, and the default value is **10**. Set this parameter based on the actual disk conditions of the cluster. | 10 | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | dfs.disk.balancer.max.disk.errors | Specifies the maximum number of errors that are allowed in a specified movement process. If the value exceeds this threshold, the movement fails. | 5 | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | dfs.disk.balancer.block.tolerance.percent | Specifies the difference threshold between the data storage capacity and perfect status of each disk during data balancing among disks. For example, the ideal data storage capacity of each disk is 1 TB, and this parameter is set to **10**. When the data storage capacity of the target disk reaches 900 GB, the storage status of the disk is considered as perfect. Value range: 1 to 100. | 10 | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | dfs.disk.balancer.plan.threshold.percent | Specifies the data density difference that is allowed between two disks during disk data balancing. If the absolute value of the data density difference between any two disks exceeds the threshold, data balancing is required. Value range: 1 to 100. | 10 | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | dfs.disk.balancer.top.nodes.number | Specifies the top *N* nodes whose disk data needs to be balanced in the cluster. | 5 | + +-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + +To use this function, set **dfs.disk.balancer.auto.enabled** to **true** and configure a proper CRON expression. Set other parameters based on the cluster status. + +.. _mrs_01_1678__en-us_topic_0000001219231277_t300bd34219e2403e8c5b2bb068fc4bec: + +.. table:: **Table 2** CRON expressions + + ====== =========================================================== + Column Description + ====== =========================================================== + 1 Minute. The value ranges from 0 to 59. + 2 Hour. The value ranges from 0 to 23. + 3 Date. The value ranges from 1 to 31. + 4 Month. The value ranges from 1 to 12. + 5 Week. The value ranges from 0 to 6. **0** indicates Sunday. + ====== =========================================================== + +Use Restrictions +---------------- + +#. Data can only be moved between disks of the same type. For example, data can only be moved between SSDs or between DISKs. + +#. Enabling this function occupies disk I/O resources and network bandwidth resources of involved nodes. Enable this function in off-peak hours. + +#. The DataNodes specified by the **dfs.disk.balancer.top.nodes.number** parameter is frequently calculated. Therefore, set the parameter to a small value. + +#. Commands for using the DiskBalancer function on the HDFS client are as follows: + + .. _mrs_01_1678__en-us_topic_0000001219231277_table897643715231: + + .. table:: **Table 3** DiskBalancer commands + + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Syntax | Description | + +============================================================+====================================================================================================================================================================================================================================================================+ + | hdfs diskbalancer -report -top | Set *N* to an integer greater than 0. This command can be used to query the top *N* nodes that require disk data balancing in the cluster. | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs diskbalancer -plan | This command can be used to generate a JSON file based on the DataNode. The file contains information about the source disk, target disk, and blocks to be moved. In addition, this command can be used to specify other parameters such as the network bandwidth. | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs diskbalancer -query | The default port number of the cluster is 9867. This command is used to query the running status of the DiskBalancer task on the current node. | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs diskbalancer -execute | In this command, **planfile** indicates the JSON file generated in the second command. Use the absolute path. | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hdfs diskbalancer -cancel | This command is used to cancel the running planfile. Use the absolute path. | + +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. note:: + + - Users running this command on the client must have the **supergroup** permission. You can use the system user **hdfs** of the HDFS service. Alternatively, you can create a user with the **supergroup** permission in the cluster and then run the command. + - Only formats and usage of commands are provided in :ref:`Table 3 `. For more parameters to be configured for each command, run the **hdfs diskbalancer -help ** command to view detailed information. + - When you troubleshoot performance problems during the cluster O&M, check whether the HDFS disk balancing occurs in the event information of the cluster. If yes, check whether DiskBalancer is enabled in the cluster. + - After the automatic DiskBalancer function is enabled, the ongoing task stops only after the current data balancing is complete. The task cannot be canceled during the balancing. + - You can manually specify certain nodes for data balancing on the client. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_hdfs_nodelabel.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_hdfs_nodelabel.rst new file mode 100644 index 0000000..3765003 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_hdfs_nodelabel.rst @@ -0,0 +1,234 @@ +:original_name: mrs_01_1676.html + +.. _mrs_01_1676: + +Configuring HDFS NodeLabel +========================== + +Scenario +-------- + +You need to configure the nodes for storing HDFS file data blocks based on data features. You can configure a label expression to an HDFS directory or file and assign one or more labels to a DataNode so that file data blocks can be stored on specified DataNodes. + +If the label-based data block placement policy is used for selecting DataNodes to store the specified files, the DataNode range is specified based on the label expression. Then proper nodes are selected from the specified range. + +- Scenario 1: DataNodes partitioning scenario + + Scenario description: + + When different application data is required to run on different nodes for separate management, label expressions can be used to achieve separation of different services, storing specified services on corresponding nodes. + + By configuring the NodeLabel feature, you can perform the following operations: + + - Store data in **/HBase** to DN1, DN2, DN3, and DN4. + - Store data in **/Spark** to DN5, DN6, DN7, and DN8. + + .. _mrs_01_1676__en-us_topic_0000001219029801_f29094c7c7de94c108e1f8ddea541eab7: + + .. figure:: /_static/images/en-us_image_0000001349259201.png + :alt: **Figure 1** DataNode partitioning scenario + + **Figure 1** DataNode partitioning scenario + + .. note:: + + - Run the **hdfs nodelabel -setLabelExpression -expression 'LabelA[fallback=NONE]' -path /Hbase** command to set an expression for the **Hbase** directory. As shown in :ref:`Figure 1 `, the data block replicas of files in the **/Hbase** directory are placed on the nodes labeled with the **LabelA**, that is, DN1, DN2, DN3, and DN4. Similarly, run the **hdfs nodelabel -setLabelExpression -expression 'LabelB[fallback=NONE]' -path /Spark** command to set an expression for the Spark directory. Data block replicas of files in the **/Spark** directory can be placed only on nodes labeled with **LabelB**, that is, DN5, DN6, DN7, and DN8. + - For details about how to set labels for a data node, see :ref:`Configuration Description `. + - If multiple racks are available in one cluster, it is recommended that DataNodes of these racks should be available under each label, to ensure reliability of data block placement. + +- Scenario 2: Specifying replica location when there are multiple racks + + Scenario description: + + In a heterogeneous cluster, customers need to allocate certain nodes with high availability to store important commercial data. Label expressions can be used to specify replica location so that the replica can be placed on a high reliable node. + + Data blocks in the **/data** directory have three replicas by default. In this case, at least one replica is stored on a node of RACK1 or RACK2 (nodes of RACK1 and RACK2 are high reliable), and the other two are stored separately on the nodes of RACK3 and RACK4. + + + .. figure:: /_static/images/en-us_image_0000001349059745.png + :alt: **Figure 2** Scenario example + + **Figure 2** Scenario example + + .. note:: + + Run the **hdfs nodelabel -setLabelExpression -expression 'LabelA||LabelB[fallback=NONE],LabelC,LabelD' -path /data** command to set an expression for the **/data** directory. + + When data is to be written to the **/data** directory, at least one data block replica is stored on a node labeled with the LabelA or LabelB, and the other two data block replicas are stored separately on the nodes labeled with the LabelC and LabelD. + +.. _mrs_01_1676__en-us_topic_0000001219029801_s7752fba8102e4f20ae2c86f564e2114c: + +Configuration Description +------------------------- + +- DataNode label configuration + + Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + + .. table:: **Table 1** Parameter description + + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ + | Parameter | Description | Default Value | + +================================+==========================================================================================================================================================================================================================================================================================================================================================================================================+==================================================================================+ + | dfs.block.replicator.classname | Used to configure the DataNode policy of HDFS. | org.apache.hadoop.hdfs.server.blockmanagement.AvailableSpaceBlockPlacementPolicy | + | | | | + | | To enable the NodeLabel function, set this parameter to **org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithNodeLabel**. | | + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ + | host2tags | Used to configure a mapping between a DataNode host and a label. | ``-`` | + | | | | + | | The host name can be configured with an IP address extension expression (for example, **192.168.1.[1-128]** or **192.168.[2-3].[1-128]**) or a regular expression (for example, **/datanode-[123]/** or **/datanode-\\d{2}/**) starting and ending with a slash (/). The label configuration name cannot contain the following characters: = / \\ **Note**: The IP address must be a service IP address. | | + +--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+ + + .. note:: + + - The **host2tags** configuration item is described as follows: + + Assume there are 20 DataNodes which range from dn-1 to dn-20 in a cluster and the IP addresses of clusters range from 10.1.120.1 to 10.1.120.20. The value of **host2tags** can be represented in either of the following methods: + + **Regular expression of the host name** + + **/dn-\\d/ = label-1** indicates that the labels corresponding to dn-1 to dn-9 are label-1, that is, dn-1 = label-1, dn-2 = label-1, ..., dn-9 = label-1. + + **/dn-((1[0-9]$)|(20$))/ = label-2** indicates that the labels corresponding to dn-10 to dn-20 are label-2, that is, dn-10 = label-2, dn-11 = label-2, ...dn-20 = label-2. + + **IP address range expression** + + **10.1.120.[1-9] = label-1** indicates that the labels corresponding to 10.1.120.1 to 10.1.120.9 are label-1, that is, 10.1.120.1 = label-1, 10.1.120.2 = label-1, ..., and 10.1.120.9 = label-1. + + **10.1.120.[10-20] = label-2** indicates that the labels corresponding to 10.1.120.10 to 10.1.120.20 are label-2, that is, 10.1.120.10 = label-2, 10.1.120.11 = label-2, ..., and 10.1.120.20 = label-2. + + - Label-based data block placement policies are applicable to capacity expansion and reduction scenarios. + + A newly added DataNode will be assigned a label if the IP address of the DataNode is within the IP address range in the **host2tags** configuration item or the host name of the DataNode matches the host name regular expression in the **host2tags** configuration item. + + For example, the value of **host2tags** is **10.1.120.[1-9] = label-1**, but the current cluster has only three DataNodes: 10.1.120.1 to 10.1.120.3. If DataNode 10.1.120.4 is added for capacity expansion, the DataNode is labeled as label-1. If the 10.1.120.3 DataNode is deleted or out of the service, no data block will be allocated to the node. + +- Set label expressions for directories or files. + + - On the HDFS parameter configuration page, configure **path2expression** to configure the mapping between HDFS directories and labels. If the configured HDFS directory does not exist, the configuration can succeed. When a directory with the same name as the HDFS directory is created manually, the configured label mapping relationship will be inherited by the directory within 30 minutes. After a labeled directory is deleted, a new directory with the same name as the deleted one will inherit its mapping within 30 minutes. + - For details about configuring items using commands, see the **hdfs nodelabel -setLabelExpression** command. + - To set label expressions using the Java API, invoke the **setLabelExpression(String src, String labelExpression)** method using the instantiated object NodeLabelFileSystem. *src* indicates a directory or file path on HDFS, and **labelExpression** indicates the label expression. + +- After the NodeLabel is enabled, you can run the **hdfs nodelabel -listNodeLabels** command to view the label information of each DataNode. + +Block Replica Location Selection +-------------------------------- + +Nodelabel supports different placement policies for replicas. The expression **label-1,label-2,label-3** indicates that three replicas are respectively placed in DataNodes containing label-1, label-2, and label-3. Different replica policies are separated by commas (,). + +If you want to place two replicas in DataNode with label-1, set the expression as follows: **label-1[replica=2],label-2,label-3**. In this case, if the default number of replicas is 3, two nodes with label-1 and one node with label-2 are selected. If the default number of replicas is 4, two nodes with label-1, one node with label-2, and one node with label-3 are selected. Note that the number of replicas is the same as that of each replica policy from left to right. However, the number of replicas sometimes exceeds the expressions. If the default number of replicas is 5, the extra replica is placed on the last node, that is, the node labeled with label-3. + +When the ACLs function is enabled and the user does not have the permission to access the labels used in the expression, the DataNode with the label is not selected for the replica. + +Deletion of Redundant Block Replicas +------------------------------------ + +If the number of block replicas exceeds the value of **dfs.replication** (number of file replicas specified by the user), HDFS will delete redundant block replicas to ensure cluster resource usage. + +The deletion rules are as follows: + +- Preferentially delete replicas that do not meet any expression. + + For example: The default number of file replicas is **3**. + + The label expression of **/test** is **LA[replica=1],LB[replica=1],LC[replica=1]**. + + The file replicas of **/test** are distributed on four nodes (D1 to D4), corresponding to labels (LA to LD). + + .. code-block:: + + D1:LA + D2:LB + D3:LC + D4:LD + + Then, block replicas on node D4 will be deleted. + +- If all replicas meet the expressions, delete the redundant replicas which are beyond the number specified by the expression. + + For example: The default number of file replicas is **3**. + + The label expression of **/test** is **LA[replica=1],LB[replica=1],LC[replica=1]**. + + The file replicas of **/test** are distributed on the following four nodes, corresponding to the following labels. + + .. code-block:: + + D1:LA + D2:LA + D3:LB + D4:LC + + Then, block replicas on node D1 or D2 will be deleted. + +- If a file owner or group of a file owner cannot access a label, preferentially delete the replica from the DataNode mapped to the label. + +Example of label-based block placement policy +--------------------------------------------- + +Assume that there are six DataNodes, namely, dn-1, dn-2, dn-3, dn-4, dn-5, and dn-6 in a cluster and the corresponding IP address range is 10.1.120.[1-6]. Six directories must be configured with label expressions. The default number of block replicas is **3**. + +- The following provides three expressions of the DataNode label in **host2labels** file. The three expressions have the same function. + + - Regular expression of the host name + + .. code-block:: + + /dn-[1456]/ = label-1,label-2 + /dn-[26]/ = label-1,label-3 + /dn-[3456]/ = label-1,label-4 + /dn-5/ = label-5 + + - IP address range expression + + .. code-block:: + + 10.1.120.[1-6] = label-1 + 10.1.120.1 = label-2 + 10.1.120.2 = label-3 + 10.1.120.[3-6] = label-4 + 10.1.120.[4-6] = label-2 + 10.1.120.5 = label-5 + 10.1.120.6 = label-3 + + - Common host name expression + + .. code-block:: + + /dn-1/ = label-1, label-2 + /dn-2/ = label-1, label-3 + /dn-3/ = label-1, label-4 + /dn-4/ = label-1, label-2, label-4 + /dn-5/ = label-1, label-2, label-4, label-5 + /dn-6/ = label-1, label-2, label-3, label-4 + +- The label expressions of the directories are set as follows: + + .. code-block:: + + /dir1 = label-1 + /dir2 = label-1 && label-3 + /dir3 = label-2 || label-4[replica=2] + /dir4 = (label-2 || label-3) && label-4 + /dir5 = !label-1 + /sdir2.txt = label-1 && label-3[replica=3,fallback=NONE] + /dir6 = label-4[replica=2],label-2 + + .. note:: + + For details about the label expression configuration, see the **hdfs nodelabel -setLabelExpression** command. + + The file data block storage locations are as follows: + + - Data blocks of files in the **/dir1** directory can be stored on any of the following nodes: dn-1, dn-2, dn-3, dn-4, dn-5, and dn-6. + - Data blocks of files in the **/dir2** directory can be stored on the dn-2 and dn-6 nodes. The default number of block replicas is **3**. The expression matches only two DataNodes. The third replica will be stored on one of the remaining nodes in the cluster. + - Data blocks of files in the **/dir3** directory can be stored on any three of the following nodes: dn-1, dn-3, dn-4, dn-5, and dn-6. + - Data blocks of files in the **/dir4** directory can be stored on the dn-4, dn-5, and dn-6 nodes. + - Data blocks of files in the **/dir5** directory do not match any DataNode and will be stored on any three nodes in the cluster, which is the same as the default block selection policy. + - For the data blocks of the **/sdir2.txt** file, two replicas are stored on the dn-2 and dn-6 nodes. The left one is not stored in the node because **fallback=NONE** is enabled. + - Data blocks of the files in the **/dir6** directory are stored on the two nodes with label-4 selected from dn-3, dn-4, dn-5, and dn-6 and another node with label-2. If the specified number of file replicas in the **/dir6** directory is more than 3, the extra replicas will be stored on a node with label-2. + +Restrictions +------------ + +In configuration files, **key** and **value** are separated by equation signs (=), colons (:), and whitespace. Therefore, the host name of the **key** cannot contain these characters because these characters may be considered as separators. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_memory_management.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_memory_management.rst new file mode 100644 index 0000000..d245efa --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_memory_management.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_0791.html + +.. _mrs_01_0791: + +Configuring Memory Management +============================= + +Scenario +-------- + +In HDFS, each file object needs to register corresponding information in the NameNode and occupies certain storage space. As the number of files increases, if the original memory space cannot store the corresponding information, you need to change the memory size. + +Configuration Description +------------------------- + +**Navigation path for setting parameters:** + +Go to the **All Configurations** page of HDFS by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** Parameter description + + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Default Value | + +=======================+========================================================================================================================================================================================================================================================+==========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | GC_PROFILE | The NameNode memory size depends on the size of FsImage, which can be calculated based on the following formula: FsImage size = Number of files x 900 bytes. You can estimate the memory size of the NameNode of HDFS based on the calculation result. | custom | + | | | | + | | The value range of this parameter is as follows: | | + | | | | + | | - **high**: 4 GB | | + | | - **medium**: 2 GB | | + | | - **low**: 256 MB | | + | | - **custom**: The memory size can be set according to the data size in GC_OPTS. | | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | GC_OPTS | JVM parameter used for garbage collection (GC). This parameter is valid only when **GC_PROFILE** is set to **custom**. Ensure that the **GC_OPT** parameter is set correctly. Otherwise, the process will fail to be started. | -Xms2G -Xmx4G -XX:NewSize=128M -XX:MaxNewSize=256M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=128M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Djdk.tls.ephemeralDHKeySize=2048 | + | | | | + | | .. important:: | | + | | | | + | | NOTICE: | | + | | Exercise caution when you modify the configuration. If the configuration is incorrect, the services are unavailable. | | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_nfs.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_nfs.rst new file mode 100644 index 0000000..f44a529 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_nfs.rst @@ -0,0 +1,52 @@ +:original_name: mrs_01_1665.html + +.. _mrs_01_1665: + +Configuring NFS +=============== + +Scenario +-------- + +Before deploying a cluster, you can deploy a Network File System (NFS) server based on requirements to store NameNode metadata to enhance data reliability. + +If the NFS server has been deployed and NFS services are configured, you can follow operations in this section to configure NFS on the cluster. These operations are optional. + +Procedure +--------- + +#. Check the permission of the shared NFS directories on the NFS server to ensure that the server can access NameNode in the MRS cluster. + +#. .. _mrs_01_1665__en-us_topic_0000001219351281_lff3e9b51a9354f89ab59a1c515495818: + + Log in to the active NameNode as user **root**. + +#. Run the following commands to create a directory and assign it write permissions: + + **mkdir** **${BIGDATA_DATA_HOME}/namenode-nfs** + + **chown omm:wheel** **${BIGDATA_DATA_HOME}/namenode-nfs** + + **chmod 750** **${BIGDATA_DATA_HOME}/namenode-nfs** + +#. .. _mrs_01_1665__en-us_topic_0000001219351281_lbb64192db9814446b3744fcbf6326d7b: + + Run the following command to mount the NFS to the active NameNode: + + **mount -t nfs -o rsize=8192,wsize=8192,soft,nolock,timeo=3,intr** *IP address of the NFS server*:*Shared directory* **${BIGDATA_DATA_HOME}/namenode-nfs** + + For example, if the IP address of the NFS server is **192.168.0.11** and the shared directory is **/opt/Hadoop/NameNode**, run the following command: + + **mount -t nfs -o rsize=8192,wsize=8192,soft,nolock,timeo=3,intr 192.168.0.11:/opt/Hadoop/NameNode** **${BIGDATA_DATA_HOME}/namenode-nfs** + +#. Perform :ref:`2 ` to :ref:`4 ` on the standby NameNode. + + .. note:: + + The names of the shared directories (for example, **/opt/Hadoop/NameNode**) created on the NFS server by the active and standby NameNodes must be different. + +#. Log in to FusionInsight Manager, and choose **Cluster** > *Name of the desired cluster* > **Service** > **HDFS** > **Configuration** > **All Configurations**. + +#. In the search box, search for **dfs.namenode.name.dir**, add **${BIGDATA_DATA_HOME}/namenode-nfs** to **Value**, and click **Save**. Separate paths with commas (,). + +#. Click **OK**. On the **Dashboard** tab page, choose **More** > **Restart Service** to restart the service. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_replica_replacement_policy_for_heterogeneous_capacity_among_datanodes.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_replica_replacement_policy_for_heterogeneous_capacity_among_datanodes.rst new file mode 100644 index 0000000..88528bc --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_replica_replacement_policy_for_heterogeneous_capacity_among_datanodes.rst @@ -0,0 +1,49 @@ +:original_name: mrs_01_0804.html + +.. _mrs_01_0804: + +Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes +================================================================================= + +Scenario +-------- + +By default, NameNode randomly selects a DataNode to write files. If the disk capacity of some DataNodes in a cluster is inconsistent (the total disk capacity of some nodes is large and of some nodes is small), the nodes with small disk capacity will be fully written. To resolve this problem, change the default disk selection policy for data written to DataNode to the available space block policy. This policy increases the probability of writing data blocks to the node with large available disk space. This ensures that the node usage is balanced when disk capacity of DataNodes is inconsistent. + +Impact on the System +-------------------- + +The disk selection policy is changed to **org.apache.hadoop.hdfs.server.blockmanagement.AvailableSpaceBlockPlacementPolicy**. It is proven that the HDFS file write performance optimizes by 3% after the modification. + +.. note:: + + **The default replica storage policy of the NameNode is as follows:** + + #. First replica: stored on the node where the client resides. + #. Second replica: stored on DataNodes of the remote rack. + #. Third replica: stored on different nodes of the same rack for the node where the client resides. + + If there are more replicas, randomly store them on other DataNodes. + + The replica selection mechanism (**org.apache.hadoop.hdfs.server.blockmanagement.AvailableSpaceBlockPlacementPolicy**) is as follows: + + #. First replica: stored on the DataNode where the client resides (the same as the default storage policy). + #. Second replica: + + - When selecting a storage node, select two data nodes that meet the requirements. + - Compare the disk usages of the two DataNodes. If the difference is smaller than 5%, store the replicas to the first node. + - If the difference exceeds 5%, there is a 60% probability (specified by **dfs.namenode.available-space-block-placement-policy.balanced-space-preference-fraction** and default value is **0.6**) that the replica is written to the node whose disk space usage is low. + + #. As for the storage of the third replica and subsequent replicas, refer to that of the second replica. + +Prerequisites +------------- + +The total disk capacity deviation of DataNodes in the cluster cannot exceed 100%. + +Procedure +--------- + +#. Go to the **All Configurations** page of HDFS by referring to :ref:`Modifying Cluster Service Configuration Parameters `. +#. Modify the disk selection policy parameters when HDFS writes data. Search for the **dfs.block.replicator.classname** parameter and change its value to **org.apache.hadoop.hdfs.server.blockmanagement.AvailableSpaceBlockPlacementPolicy**. +#. Save the modified configuration. Restart the expired service or instance for the configuration to take effect. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_reserved_percentage_of_disk_usage_on_datanodes.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_reserved_percentage_of_disk_usage_on_datanodes.rst new file mode 100644 index 0000000..67caf98 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_reserved_percentage_of_disk_usage_on_datanodes.rst @@ -0,0 +1,35 @@ +:original_name: mrs_01_1675.html + +.. _mrs_01_1675: + +Configuring Reserved Percentage of Disk Usage on DataNodes +========================================================== + +Scenario +-------- + +When the Yarn local directory and DataNode directory are on the same disk, the disk with larger capacity can run more tasks. Therefore, more intermediate data is stored in the Yarn local directory. + +Currently, you can set **dfs.datanode.du.reserved** to configure the absolute value of the reserved disk space on DataNodes. A small value cannot meet the requirements of a disk with large capacity. However, configuring a large value for a disk with same capacity wastes a lot of disk space. + +To avoid this problem, a new parameter **dfs.datanode.du.reserved.percentage** is introduced to configure the reserved percentage of the disk space. + +.. note:: + + - If **dfs.datanode.du.reserved.percentage** and **dfs.datanode.du.reserved** are configured at the same time, the larger value of the reserved disk space calculated using the two parameters is used as the reserved space of the data nodes. + - You are advised to set **dfs.datanode.du.reserved** or **dfs.datanode.du.reserved.percentage** based on the actual disk space. + +Configuration Description +------------------------- + +Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** Parameter description + + +-------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +=====================================+======================================================================================================================================================+=======================+ + | dfs.datanode.du.reserved.percentage | Indicates the percentage of the reserved disk space on DataNodes. The DataNode permanently reserves the disk space calculated using this percentage. | 10 | + | | | | + | | The value is an integer ranging from 0 to 100. | | + +-------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_damaged_disk_volume.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_damaged_disk_volume.rst new file mode 100644 index 0000000..495ef75 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_damaged_disk_volume.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_1669.html + +.. _mrs_01_1669: + +Configuring the Damaged Disk Volume +=================================== + +Scenario +-------- + +In the open source version, if multiple data storage volumes are configured for a DataNode, the DataNode stops providing services by default if one of the volumes is damaged. You can change the value of **dfs.datanode.failed.volumes.tolerated** to specify the number of damaged disk volumes that are allowed. If the number of damaged volumes does not exceed the threshold, DataNode continues to provide services. + +Configuration Description +------------------------- + +**Navigation path for setting parameters:** + +Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** Parameter description + + +---------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Default Value | + +=======================================+==============================================================================================================================================================================================================================================================================================================================================+===============+ + | dfs.datanode.failed.volumes.tolerated | Specifies the number of damaged volumes that are allowed before the DataNode stops providing services. By default, there must be at least one valid volume. The value **-1** indicates that the minimum value of a valid volume is **1**. The value greater than or equal to **0** indicates the number of damaged volumes that are allowed. | -1 | + +---------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_namenode_blacklist.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_namenode_blacklist.rst new file mode 100644 index 0000000..2485caa --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_namenode_blacklist.rst @@ -0,0 +1,54 @@ +:original_name: mrs_01_1670.html + +.. _mrs_01_1670: + +Configuring the NameNode Blacklist +================================== + +Scenario +-------- + +In the existing default DFSclient failover proxy provider, if a NameNode in a process is faulty, all HDFS client instances in the same process attempt to connect to the NameNode again. As a result, the application waits for a long time and timeout occurs. + +When clients in the same JVM process connect to the NameNode that cannot be accessed, the system is overloaded. The NameNode blacklist is equipped with the MRS cluster to avoid this problem. + +In the new Blacklisting DFSClient failover provider, the faulty NameNode is recorded in a list. The DFSClient then uses the information to prevent the client from connecting to such NameNodes again. This function is called NameNode blacklisting. + +For example, there is a cluster with the following configurations: + +namenode: nn1, nn2 + +dfs.client.failover.connection.retries: 20 + +Processes in a single JVM: 10 clients + +In the preceding cluster, if the active **nn1** cannot be accessed, client1 will retry the connection for 20 times. Then, a failover occurs, and client1 will connect to **nn2**. In the same way, other clients also connect to **nn2** when the failover occurs after retrying the connection to **nn1** for 20 times. Such process prolongs the fault recovery of NameNode. + +In this case, the NameNode blacklisting adds **nn1** to the blacklist when client1 attempts to connect to the active **nn1** which is already faulty. Therefore, other clients will avoid trying to connect to **nn1** but choose **nn2** directly. + +.. note:: + + If, at any time, all NameNodes are added to the blacklist, the content in the blacklist will be cleared, and the client attempts to connect to the NameNodes based on the initial NameNode list. If any fault occurs again, the NameNode is still added to the blacklist. + + +.. figure:: /_static/images/en-us_image_0000001296059928.jpg + :alt: **Figure 1** NameNode blacklisting working principle + + **Figure 1** NameNode blacklisting working principle + +Configuration Description +------------------------- + +Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** NameNode blacklisting parameters + + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | Parameter | Description | Default Value | + +=========================================================+=========================================================================================================+=========================================================================+ + | dfs.client.failover.proxy.provider.\ *[nameservice ID]* | Client Failover proxy provider class which creates the NameNode proxy using the authenticated protocol. | org.apache.hadoop.hdfs.server.namenode.ha.AdaptiveFailoverProxyProvider | + | | | | + | | Set this parameter to **org.apache.hadoop.hdfs.server.namenode.ha.BlackListingFailoverProxyProvider**. | | + | | | | + | | You can configure the observer NameNode to process read requests. | | + +---------------------------------------------------------+---------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_number_of_files_in_a_single_hdfs_directory.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_number_of_files_in_a_single_hdfs_directory.rst new file mode 100644 index 0000000..d493f89 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_number_of_files_in_a_single_hdfs_directory.rst @@ -0,0 +1,35 @@ +:original_name: mrs_01_0805.html + +.. _mrs_01_0805: + +Configuring the Number of Files in a Single HDFS Directory +========================================================== + +Scenario +-------- + +Generally, multiple services are deployed in a cluster, and the storage of most services depends on the HDFS file system. Different components such as Spark and Yarn or clients are constantly writing files to the same HDFS directory when the cluster is running. However, the number of files in a single directory in HDFS is limited. Users must plan to prevent excessive files in a single directory and task failure. + +You can set the number of files in a single directory using the **dfs.namenode.fs-limits.max-directory-items** parameter in HDFS. + +Procedure +--------- + +#. Go to the **All Configurations** page of HDFS by referring to :ref:`Modifying Cluster Service Configuration Parameters `. +#. Search for the configuration item **dfs.namenode.fs-limits.max-directory-items**. + + .. table:: **Table 1** Parameter description + + +--------------------------------------------+----------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +============================================+========================================+=======================+ + | dfs.namenode.fs-limits.max-directory-items | Maximum number of items in a directory | 1048576 | + | | | | + | | Value range: 1 to 6,400,000 | | + +--------------------------------------------+----------------------------------------+-----------------------+ + +#. Set the maximum number of files that can be stored in a single HDFS directory. Save the modified configuration. Restart the expired service or instance for the configuration to take effect. + + .. note:: + + Plan data storage in advance based on time and service type categories to prevent excessive files in a single directory. You are advised to use the default value, which is about 1 million pieces of data in a single directory. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_recycle_bin_mechanism.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_recycle_bin_mechanism.rst new file mode 100644 index 0000000..6f404d2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_the_recycle_bin_mechanism.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_0806.html + +.. _mrs_01_0806: + +Configuring the Recycle Bin Mechanism +===================================== + +Scenario +-------- + +On HDFS, deleted files are moved to the recycle bin (trash can) so that the data deleted by mistake can be restored. + +You can set the time threshold for storing files in the recycle bin. Once the file storage duration exceeds the threshold, it is permanently deleted from the recycle bin. If the recycle bin is cleared, all files in the recycle bin are permanently deleted. + +Configuration Description +------------------------- + +If a file is deleted from HDFS, the file is saved in the trash space rather than cleared immediately. After the aging time is due, the deleted file becomes an aging file and will be cleared based on the system mechanism or manually cleared by users. + +**Parameter portal:** + +Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** Parameter description + + +------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +==============================+===================================================================================================================================================================================================================================================================================================================================================================================================================================================================+=======================+ + | fs.trash.interval | Trash collection time, in minutes. If data in the trash station exceeds the time, the data will be deleted. Value range: 1440 to 259200 | 2880 | + +------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | fs.trash.checkpoint.interval | Interval between trash checkpoints, in minutes. The value must be less than or equal to the value of **fs.trash.interval**. The checkpoint program creates a checkpoint every time it runs and removes the checkpoint created **fs.trash.interval** minutes ago. For example, the system checks whether aging files exist every 10 minutes and deletes aging files if any. Files that are not aging are stored in the checkpoint list waiting for the next check. | 60 | + | | | | + | | If this parameter is set to 0, the system does not check aging files and all aging files are saved in the system. | | + | | | | + | | Value range: 0 to *fs.trash.interval* | | + | | | | + | | .. note:: | | + | | | | + | | It is not recommended to set this parameter to 0 because aging files will use up the disk space of the cluster. | | + +------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/configuring_ulimit_for_hbase_and_hdfs.rst b/doc/component-operation-guide-lts/source/using_hdfs/configuring_ulimit_for_hbase_and_hdfs.rst new file mode 100644 index 0000000..e661b0d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/configuring_ulimit_for_hbase_and_hdfs.rst @@ -0,0 +1,53 @@ +:original_name: mrs_01_0801.html + +.. _mrs_01_0801: + +Configuring ulimit for HBase and HDFS +===================================== + +Symptom +------- + +When you open an HDFS file, an error occurs due to the limit on the number of file handles. Information similar to the following is displayed. + +.. code-block:: + + IOException (Too many open files) + +Procedure +--------- + +You can contact the system administrator to add file handles for each user. This is a configuration on the OS instead of HBase or HDFS. It is recommended that the system administrator configure the number of file handles based on the service traffic of HBase and HDFS and the rights of each user. If a user performs a large number of operations frequently on the HDFS that has large service traffic, set the number of file handles of this user to a large value. + +#. Log in to the OSs of all nodes or clients in the cluster as user **root**, and go to the **/etc/security** directory. + +#. Run the following command to edit the **limits.conf** file: + + **vi limits.conf** + + Add the following information to the file. + + .. code-block:: + + hdfs - nofile 32768 + hbase - nofile 32768 + + **hdfs** and **hbase** indicate the usernames of the OSs that are used during the services. + + .. note:: + + - Only user **root** has the rights to edit the **limits.conf** file. + - If this modification does not take effect, check whether other nofile values exist in the **/etc/security/limits.d** directory. Such values may overwrite the values set in the **/etc/security/limits.conf** file. + - If a user needs to perform operations on HBase, set the number of file handles of this user to a value greater than **10000**. If a user needs to perform operations on HDFS, set the number of file handles of this user based on the service traffic. It is recommended that the value not be too small. If a user needs to perform operations on both HBase and HDFS, set the number of file handles of this user to a large value, such as **32768**. + +#. Run the following command to check the limit on the number of file handles of a user: + + **su -** *user_name* + + **ulimit -n** + + The limit on the number of file handles of this user is displayed as follows. + + .. code-block:: + + 8194 diff --git a/doc/component-operation-guide-lts/source/using_hdfs/creating_an_hdfs_role.rst b/doc/component-operation-guide-lts/source/using_hdfs/creating_an_hdfs_role.rst new file mode 100644 index 0000000..69c7907 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/creating_an_hdfs_role.rst @@ -0,0 +1,86 @@ +:original_name: mrs_01_1662.html + +.. _mrs_01_1662: + +Creating an HDFS Role +===================== + +Scenario +-------- + +This section describes how to create and configure an HDFS role on FusionInsight Manager. The HDFS role is granted the rights to read, write, and execute HDFS directories or files. + +A user has the complete permission on the created HDFS directories or files, that is, the user can directly read data from and write data to as well as authorize others to access the HDFS directories or files. + +.. note:: + + - An HDFS role can be created only in security mode. + - If the current component uses Ranger for permission control, HDFS policies must be configured based on Ranger for permission management. For details, see :ref:`Adding a Ranger Access Permission Policy for HDFS `. + +Prerequisites +------------- + +The system administrator has understood the service requirements. + +Procedure +--------- + +#. Log in to FusionInsight Manager, and choose **System** > **Permission** > **Role**. + +#. On the displayed page, click **Create Role** and fill in **Role Name** and **Description**. + +#. Configure the resource permission. For details, see :ref:`Table 1 `. + + **File System**: HDFS directory and file permission + + Common HDFS directories are as follows: + + - **flume**: Flume data storage directory + + - **hbase**: HBase data storage directory + + - **mr-history**: MapReduce task information storage directory + + - **tmp**: temporary data storage directory + + - **user**: user data storage directory + + .. _mrs_01_1662__en-us_topic_0000001219149039_tc5a4f557e6144488a1ace112bb8db6ee: + + .. table:: **Table 1** Setting a role + + +-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Task | Operation | + +===================================================================================================================+======================================================================================================================================+ + | Setting the HDFS administrator permission | In the **Configure Resource Permission** area, choose *Name of the desired cluster* > HDFS, and select **Cluster Admin Operations**. | + | | | + | | .. note:: | + | | | + | | The setting takes effect after the HDFS service is restarted. | + +-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for users to check and recover HDFS | a. In the **Configure Resource Permission** area, choose *Name of the desired cluster* > HDFS > **File System**. | + | | b. Locate the save path of specified directories or files on HDFS. | + | | c. In the **Permission** column of the specified directories or files, select **Read** and **Execute**. | + +-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for users to read directories or files of other users | a. In the **Configure Resource Permission** area, choose *Name of the desired cluster* > HDFS > **File System**. | + | | b. Locate the save path of specified directories or files on HDFS. | + | | c. In the **Permission** column of the specified directories or files, select **Read** and **Execute**. | + +-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for users to write data to files of other users | a. In the **Configure Resource Permission** area, choose *Name of the desired cluster* > HDFS > **File System**. | + | | b. Locate the save path of specified files on HDFS. | + | | c. In the **Permission** column of the specified files, select **Write** and **Execute**. | + +-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for users to create or delete sub-files or sub-directories in the directory of other users | a. In the **Configure Resource Permission** area, choose *Name of the desired cluster* > HDFS > **File System**. | + | | b. Locate the path where the specified directory is saved in the HDFS. | + | | c. In the **Permission** column of the specified directories, select **Write** and **Execute**. | + +-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for users to execute directories or files of other users | a. In the **Configure Resource Permission** area, choose *Name of the desired cluster* > HDFS > **File System**. | + | | b. Locate the save path of specified directories or files on HDFS. | + | | c. In the **Permission** column of the specified directories or files, select **Execute**. | + +-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission for allowing subdirectories to inherit all permissions of their parent directories | a. In the **Configure Resource Permission** area, choose *Name of the desired cluster* > HDFS > **File System**. | + | | b. Locate the save path of specified directories or files on HDFS. | + | | c. In the **Permission** column of the specified directories or files, select **Recursive**. | + +-------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ + +#. Click **OK**, and return to the **Role** page. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/blocks_miss_on_the_namenode_ui_after_the_successful_rollback.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/blocks_miss_on_the_namenode_ui_after_the_successful_rollback.rst new file mode 100644 index 0000000..96ec9ec --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/blocks_miss_on_the_namenode_ui_after_the_successful_rollback.rst @@ -0,0 +1,62 @@ +:original_name: mrs_01_1704.html + +.. _mrs_01_1704: + +Blocks Miss on the NameNode UI After the Successful Rollback +============================================================ + +Question +-------- + +Why are some blocks missing on the NameNode UI after the rollback is successful? + +Answer +------ + +This problem occurs because blocks with new IDs or genstamps may exist on the DataNode. The block files in the DataNode may have different generation flags and lengths from those in the rollback images of the NameNode. Therefore, the NameNode rejects these blocks in the DataNode and marks the files as damaged. + +**Scenarios:** + +#. Before an upgrade: + + Client A writes some data to file X. (Assume A bytes are written.) + +2. During an upgrade: + + Client A still writes data to file X. (The data in the file is A + B bytes.) + +3. After an upgrade: + + Client A completes the file writing. The final data is A + B bytes. + +4. Rollback started: + + The status will be rolled back to the status before the upgrade. That is, file X in NameNode will have A bytes, but block files in DataNode will have A + B bytes. + +**Recovery procedure:** + +#. Obtain the list of damaged files from NameNode web UI or run the following command to obtain: + + **hdfs fsck -list-corruptfileblocks** + +#. Run the following command to delete unnecessary files: + + **hdfs fsck - delete** + + .. note:: + + Deleting a file is a high-risk operation. Ensure that the files are no longer needed before performing this operation. + +#. For the required files, run the **fsck** command to obtain the block list and block sequence. + + - In the block sequence table provided, use the block ID to search for the data directory in the DataNode and download the corresponding block from the DataNode. + + - Write all such block files in appending mode based on the sequence to construct the original file. + + Example: + + File 1--> blk_1, blk_2, blk_3 + + Create a file by combining the contents of all three block files from the same sequence. + + - Delete the old file from HDFS and rewrite the new file. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/can_i_delete_or_modify_the_data_storage_directory_in_datanode.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/can_i_delete_or_modify_the_data_storage_directory_in_datanode.rst new file mode 100644 index 0000000..1ac57c2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/can_i_delete_or_modify_the_data_storage_directory_in_datanode.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_1703.html + +.. _mrs_01_1703: + +Can I Delete or Modify the Data Storage Directory in DataNode? +============================================================== + +Question +-------- + +- In DataNode, the storage directory of data blocks is specified by **dfs.datanode.data.dir**\ **.** Can I modify **dfs.datanode.data.dir** to modify the data storage directory? +- Can I modify files under the data storage directory? + +Answer +------ + +During the system installation, you need to configure the **dfs.datanode.data.dir** parameter to specify one or more root directories. + +- During the system installation, you need to configure the dfs.datanode.data.dir parameter to specify one or more root directories. + +- Exercise caution when modifying dfs.datanode.data.dir. You can configure this parameter to add a new data root directory. +- Do not modify or delete data blocks in the storage directory. Otherwise, the data blocks will lose. + +.. note:: + + Similarly, do not delete the storage directory, or modify or delete data blocks under the directory using the following parameters: + + - dfs.namenode.edits.dir + - dfs.namenode.name.dir + - dfs.journalnode.edits.dir diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/datanode_is_normal_but_cannot_report_data_blocks.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/datanode_is_normal_but_cannot_report_data_blocks.rst new file mode 100644 index 0000000..38e4faa --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/datanode_is_normal_but_cannot_report_data_blocks.rst @@ -0,0 +1,62 @@ +:original_name: mrs_01_1693.html + +.. _mrs_01_1693: + +DataNode Is Normal but Cannot Report Data Blocks +================================================ + +Question +-------- + +The DataNode is normal, but cannot report data blocks. As a result, the existing data blocks cannot be used. + +Answer +------ + +This error may occur when the number of data blocks in a data directory exceeds four times the upper limit (4 x 1 MB). And the DataNode generates the following error logs: + +.. code-block:: + + 2015-11-05 10:26:32,936 | ERROR | DataNode:[[[DISK]file:/srv/BigData/hadoop/data1/dn/]] heartbeating to + vm-210/10.91.8.210:8020 | Exception in BPOfferService for Block pool BP-805114975-10.91.8.210-1446519981645 + (Datanode Uuid bcada350-0231-413b-bac0-8c65e906c1bb) service to vm-210/10.91.8.210:8020 | BPServiceActor.java:824 + java.lang.IllegalStateException:com.google.protobuf.InvalidProtocolBufferException:Protocol message was too large.May + be malicious.Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:369) + at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:347) at org.apache.hadoop.hdfs. + protocol.BlockListAsLongs$BufferDecoder.getBlockListAsLongs(BlockListAsLongs.java:325) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB. + blockReport(DatanodeProtocolClientSideTranslatorPB.java:190) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:473) + at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:685) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:822) + at java.lang.Thread.run(Thread.java:745) Caused by:com.google.protobuf.InvalidProtocolBufferException:Protocol message was too large.May be malicious.Use CodedInputStream.setSizeLimit() + to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) + at com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769) at com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462) at com.google.protobuf. + CodedInputStream.readSInt64(CodedInputStream.java:363) at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:363) + +The number of data blocks in the data directory is displayed as **Metric**. You can monitor its value through **http://:/jmx**. If the value is greater than four times the upper limit (4 x 1 MB), you are advised to configure multiple drives and restart HDFS. + +**Recovery procedure:** + +#. Configure multiple data directories on the DataNode. + + For example, configure multiple directories on the DataNode where only the **/data1/datadir** directory is configured: + + .. code-block:: + + dfs.datanode.data.dir /data1/datadir + + Configure as follows: + + .. code-block:: + + dfs.datanode.data.dir /data1/datadir/,/data2/datadir,/data3/datadir + + .. note:: + + You are advised to configure multiple data directories on multiple disks. Otherwise, performance may be affected. + +#. Restart the HDFS. + +#. Perform the following operation to move the data to the new data directory: + + **mv** */data1/datadir/current/finalized/subdir1 /data2/datadir/current/finalized/subdir1* + +#. Restart the HDFS. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/hdfs_webui_cannot_properly_update_information_about_damaged_data.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/hdfs_webui_cannot_properly_update_information_about_damaged_data.rst new file mode 100644 index 0000000..dfb9e56 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/hdfs_webui_cannot_properly_update_information_about_damaged_data.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_1694.html + +.. _mrs_01_1694: + +HDFS WebUI Cannot Properly Update Information About Damaged Data +================================================================ + +Question +-------- + +#. When errors occur in the **dfs.datanode.data.dir** directory of DataNode due to the permission or disk damage, HDFS WebUI does not display information about damaged data. +#. After errors are restored, HDFS WebUI does not timely remove related information about damaged data. + +Answer +------ + +#. DataNode checks whether the disk is normal only when errors occur in file operations. Therefore, only when a data damage is detected and the error is reported to NameNode, NameNode displays information about the damaged data on HDFS WebUI. +#. After errors are fixed, you need to restart DataNode. During restarting DataNode, all data states are checked and damaged data information is uploaded to NameNode. Therefore, after errors are fixed, damaged data information is not displayed on the HDFS WebUI only by restarting DataNode. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/index.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/index.rst new file mode 100644 index 0000000..2ea0af9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/index.rst @@ -0,0 +1,50 @@ +:original_name: mrs_01_1690.html + +.. _mrs_01_1690: + +FAQ +=== + +- :ref:`NameNode Startup Is Slow ` +- :ref:`Why MapReduce Tasks Fails in the Environment with Multiple NameServices? ` +- :ref:`DataNode Is Normal but Cannot Report Data Blocks ` +- :ref:`HDFS WebUI Cannot Properly Update Information About Damaged Data ` +- :ref:`Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception? ` +- :ref:`Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated? ` +- :ref:`Why Does an Error Occur During DataNode Capacity Calculation When Multiple data.dir Are Configured in a Partition? ` +- :ref:`Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage ` +- :ref:`Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files ` +- :ref:`Why Does Array Border-crossing Occur During FileInputFormat Split? ` +- :ref:`Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST? ` +- :ref:`The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time ` +- :ref:`Can I Delete or Modify the Data Storage Directory in DataNode? ` +- :ref:`Blocks Miss on the NameNode UI After the Successful Rollback ` +- :ref:`Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS ` +- :ref:`Why are There Two Standby NameNodes After the active NameNode Is Restarted? ` +- :ref:`When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again? ` +- :ref:`"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI ` +- :ref:`NameNode Fails to Be Restarted Due to EditLog Discontinuity ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + namenode_startup_is_slow + why_mapreduce_tasks_fails_in_the_environment_with_multiple_nameservices + datanode_is_normal_but_cannot_report_data_blocks + hdfs_webui_cannot_properly_update_information_about_damaged_data + why_does_the_distcp_command_fail_in_the_secure_cluster,_causing_an_exception + why_does_datanode_fail_to_start_when_the_number_of_disks_specified_by_dfs.datanode.data.dir_equals_dfs.datanode.failed.volumes.tolerated + why_does_an_error_occur_during_datanode_capacity_calculation_when_multiple_data.dir_are_configured_in_a_partition + standby_namenode_fails_to_be_restarted_when_the_system_is_powered_off_during_metadata_namespace_storage + why_data_in_the_buffer_is_lost_if_a_power_outage_occurs_during_storage_of_small_files + why_does_array_border-crossing_occur_during_fileinputformat_split + why_is_the_storage_type_of_file_copies_disk_when_the_tiered_storage_policy_is_lazy_persist + the_hdfs_client_is_unresponsive_when_the_namenode_is_overloaded_for_a_long_time + can_i_delete_or_modify_the_data_storage_directory_in_datanode + blocks_miss_on_the_namenode_ui_after_the_successful_rollback + why_is_java.net.socketexception_no_buffer_space_available_reported_when_data_is_written_to_hdfs + why_are_there_two_standby_namenodes_after_the_active_namenode_is_restarted + when_does_a_balance_process_in_hdfs,_shut_down_and_fail_to_be_executed_again + this_page_cant_be_displayed_is_displayed_when_internet_explorer_fails_to_access_the_native_hdfs_ui + namenode_fails_to_be_restarted_due_to_editlog_discontinuity diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/namenode_fails_to_be_restarted_due_to_editlog_discontinuity.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/namenode_fails_to_be_restarted_due_to_editlog_discontinuity.rst new file mode 100644 index 0000000..33c4eab --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/namenode_fails_to_be_restarted_due_to_editlog_discontinuity.rst @@ -0,0 +1,39 @@ +:original_name: mrs_01_1709.html + +.. _mrs_01_1709: + +NameNode Fails to Be Restarted Due to EditLog Discontinuity +=========================================================== + +Question +-------- + +If a JournalNode server is powered off, the data directory disk is fully occupied, and the network is abnormal, the EditLog sequence number on the JournalNode is inconsecutive. In this case, the NameNode restart may fail. + +Symptom +------- + +The NameNode fails to be restarted. The following error information is reported in the NameNode run logs: + +|image1| + +Solution +-------- + +#. Find the active NameNode before the restart, go to its data directory (you can obtain the directory, such as **/srv/BigData/namenode/current** by checking the configuration item **dfs.namenode.name.dir**), and obtain the sequence number of the latest FsImage file, as shown in the following figure: + + |image2| + +#. Check the data directory of each JournalNode (you can obtain the directory such as\ **/srv/BigData/journalnode/hacluster/current** by checking the value of the configuration item **dfs.journalnode.edits.dir**), and check whether the sequence number starting from that obtained in step 1 is consecutive in edits files. That is, you need to check whether the last sequence number of the previous edits file is consecutive with the first sequence number of the next edits file. (As shown in the following figure, edits_0000000000013259231-0000000000013259237 and edits_0000000000013259239-0000000000013259246 are not consecutive.) + + |image3| + +#. If the edits files are not consecutive, check whether the edits files with the related sequence number exist in the data directories of other JournalNodes or NameNode. If the edits files can be found, copy a consecutive segment to the JournalNode. + +#. In this way, all inconsecutive edits files are restored. + +#. Restart the NameNode and check whether the restart is successful. If the fault persists, contact technical support. + +.. |image1| image:: /_static/images/en-us_image_0000001296219580.png +.. |image2| image:: /_static/images/en-us_image_0000001295740144.png +.. |image3| image:: /_static/images/en-us_image_0000001349139661.png diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/namenode_startup_is_slow.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/namenode_startup_is_slow.rst new file mode 100644 index 0000000..7f50f0d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/namenode_startup_is_slow.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_1691.html + +.. _mrs_01_1691: + +NameNode Startup Is Slow +======================== + +Question +-------- + +The NameNode startup is slow when it is restarted immediately after a large number of files (for example, 1 million files) are deleted. + +Answer +------ + +It takes time for the DataNode to delete the corresponding blocks after files are deleted. When the NameNode is restarted immediately, it checks the block information reported by all DataNodes. If a deleted block is found, the NameNode generates the corresponding INFO log information, as shown below: + +.. code-block:: + + 2015-06-10 19:25:50,215 | INFO | IPC Server handler 36 on 25000 | BLOCK* processReport: + blk_1075861877_2121067 on node 10.91.8.218:9866 size 10249 does not belong to any file | + org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1854) + +A log is generated for each deleted block. A file may contain one or more blocks. Therefore, after startup, the NameNode spends a large amount of time printing logs when a large number of files are deleted. As a result, the NameNode startup becomes slow. + +To address this issue, the following operations can be performed to speed up the startup: + +#. After a large number of files are deleted, wait until the DataNode deletes the corresponding blocks and then restart the NameNode. + + You can run the **hdfs dfsadmin -report** command to check the disk space and check whether the files have been deleted. + +#. If a large number of the preceding logs are generated, you can change the NameNode log level to **ERROR** so that the NameNode stops printing such logs. + + After the NameNode is restarted, change the log level back to **INFO**. You do not need to restart the service after changing the log level. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/standby_namenode_fails_to_be_restarted_when_the_system_is_powered_off_during_metadata_namespace_storage.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/standby_namenode_fails_to_be_restarted_when_the_system_is_powered_off_during_metadata_namespace_storage.rst new file mode 100644 index 0000000..55b48fe --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/standby_namenode_fails_to_be_restarted_when_the_system_is_powered_off_during_metadata_namespace_storage.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_1698.html + +.. _mrs_01_1698: + +Standby NameNode Fails to Be Restarted When the System Is Powered off During Metadata (Namespace) Storage +========================================================================================================= + +Question +-------- + +When the standby NameNode is powered off during metadata (namespace) storage, it fails to be started and the following error information is displayed. + +|image1| + +Answer +------ + +When the standby NameNode is powered off during metadata (namespace) storage, it fails to be started and the MD5 file is damaged. Remove the damaged fsimage and start the standby NameNode to rectify the fault. After the rectification, the standby NameNode loads the previous fsimage and reproduces all edits. + +Recovery procedure: + +#. Run the following command to remove the damaged fsimage: + + **rm -rf ${BIGDATA_DATA_HOME}/namenode/current/fsimage_0000000000000096** + +#. Start the standby NameNode. + +.. |image1| image:: /_static/images/en-us_image_0000001349259025.png diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/the_hdfs_client_is_unresponsive_when_the_namenode_is_overloaded_for_a_long_time.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/the_hdfs_client_is_unresponsive_when_the_namenode_is_overloaded_for_a_long_time.rst new file mode 100644 index 0000000..8687486 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/the_hdfs_client_is_unresponsive_when_the_namenode_is_overloaded_for_a_long_time.rst @@ -0,0 +1,45 @@ +:original_name: mrs_01_1702.html + +.. _mrs_01_1702: + +The HDFS Client Is Unresponsive When the NameNode Is Overloaded for a Long Time +=============================================================================== + +**Question** +------------ + +When the NameNode node is overloaded (100% of the CPU is occupied), the NameNode is unresponsive. The HDFS clients that are connected to the overloaded NameNode fail to run properly. However, the HDFS clients that are newly connected to the NameNode will be switched to a backup NameNode and run properly. + +**Answer** +---------- + +The default configuration must be used (as described in :ref:`Table 1 `) when the error preceding described occurs: the **keep alive** mechanism is enabled for the RPC connection between the HDFS client and the NameNode. The **keep alive** mechanism will keep the HDFS client waiting for the response from server and prevent the connection from being out timed, causing the unresponsiveness of the HDFS client. + +Perform the following operations to the unresponsive HDFS client: + +- Leave the HDFS client waiting. Once the CPU usage of the node where NameNode locates drops, the NameNode will obtain CPU resources and the HDFS client will receive a response. +- If you do not want to leave the HDFS client running, restart the application where the HDFS client locates to reconnect the HDFS client to another idle NameNode. + +Procedure: + +Configure the following parameters in the **c**\ **ore-site.xml** file on the client. + +.. _mrs_01_1702__en-us_topic_0000001219350525_tf99cac42ab7947b3bffe186b74e79d38: + +.. table:: **Table 1** Parameter description + + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +=======================+==========================================================================================================================================================================================================================+=======================+ + | ipc.client.ping | If the **ipc.client.ping** parameter is configured to **true**, the HDFS client will wait for the response from the server and periodically send the **ping** message to avoid disconnection caused by **tcp timeout**. | true | + | | | | + | | If the **ipc.client.ping** parameter is configured to **false**, the HDFS client will set the value of **ipc.ping.interval** as the timeout time. If no response is received within that time, timeout occurs. | | + | | | | + | | To avoid the unresponsiveness of HDFS when the NameNode is overloaded for a long time, you are advised to set the parameter to **false**. | | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | ipc.ping.interval | If the value of **ipc.client.ping** is **true**, **ipc.ping.interval** indicates the interval between sending the ping messages. | 60000 | + | | | | + | | If the value of **ipc.client.ping** is **false**, **ipc.ping.interval** indicates the timeout time for connection. | | + | | | | + | | To avoid the unresponsiveness of HDFS when the NameNode is overloaded for a long time, you are advised to set the parameter to a large value, for example **900000** (unit ms) to avoid timeout when the server is busy. | | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/this_page_cant_be_displayed_is_displayed_when_internet_explorer_fails_to_access_the_native_hdfs_ui.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/this_page_cant_be_displayed_is_displayed_when_internet_explorer_fails_to_access_the_native_hdfs_ui.rst new file mode 100644 index 0000000..74a6887 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/this_page_cant_be_displayed_is_displayed_when_internet_explorer_fails_to_access_the_native_hdfs_ui.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_1708.html + +.. _mrs_01_1708: + +"This page can't be displayed" Is Displayed When Internet Explorer Fails to Access the Native HDFS UI +===================================================================================================== + +Question +-------- + +Occasionally, nternet Explorer 9, Explorer 10, or Explorer 11 fails to access the native HDFS UI. + +Symptom +------- + +Internet Explorer 9, Explorer 10, or Explorer 11 fails to access the native HDFS UI, as shown in the following figure. + +|image1| + +Cause +----- + +Some Internet Explorer 9, Explorer 10, or Explorer 11versions fail to handle SSL handshake issues, causing access failure. + +Solution +-------- + +Refresh the page. + +.. |image1| image:: /_static/images/en-us_image_0000001349059765.jpg diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/when_does_a_balance_process_in_hdfs,_shut_down_and_fail_to_be_executed_again.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/when_does_a_balance_process_in_hdfs,_shut_down_and_fail_to_be_executed_again.rst new file mode 100644 index 0000000..4e96264 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/when_does_a_balance_process_in_hdfs,_shut_down_and_fail_to_be_executed_again.rst @@ -0,0 +1,37 @@ +:original_name: mrs_01_1707.html + +.. _mrs_01_1707: + +When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again? +============================================================================= + +Question +-------- + +After I start a Balance process in HDFS, the process is shut down abnormally. If I attempt to execute the Balance process again, it fails again. + +Answer +------ + +After a Balance process is executed in HDFS, another Balance process can be executed only after the **/system/balancer.id** file is automatically released. + +However, if a Balance process is shut down abnormally, the **/system/balancer.id** has not been released when the Balance is executed again, which triggers the **append /system/balancer.id** operation. + +- If the time spent on releasing the **/system/balancer.id** file exceeds the soft-limit lease period 60 seconds, executing the Balance process again triggers the append operation, which preempts the lease. The last block is in construction or under recovery status, which triggers the block recovery operation. The **/system/balancer.id** file cannot be closed until the block recovery completes. Therefore, the append operation fails. + + After the **append /system/balancer.id** operation fails, the exception message **RecoveryInProgressException** is displayed. + + .. code-block:: + + org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.protocol.RecoveryInProgressException): Failed to APPEND_FILE /system/balancer.id for DFSClient because lease recovery is in progress. Try again later. + +- If the time spent on releasing the **/system/balancer.id** file is within 60 seconds, the original client continues to own the lease and the exception AlreadyBeingCreatedException occurs and null is returned to the client. The following exception message is displayed on the client: + + .. code-block:: + + java.io.IOException: Cannot create any NameNode Connectors.. Exiting... + +Either of the following methods can be used to solve the problem: + +- Execute the Balance process again after the hard-limit lease period expires for 1 hour, when the original client has released the lease. +- Delete the **/system/balancer.id** file before executing the Balance process again. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/why_are_there_two_standby_namenodes_after_the_active_namenode_is_restarted.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_are_there_two_standby_namenodes_after_the_active_namenode_is_restarted.rst new file mode 100644 index 0000000..f423521 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_are_there_two_standby_namenodes_after_the_active_namenode_is_restarted.rst @@ -0,0 +1,44 @@ +:original_name: mrs_01_1706.html + +.. _mrs_01_1706: + +Why are There Two Standby NameNodes After the active NameNode Is Restarted? +=========================================================================== + +Question +-------- + +Why are there two standby NameNodes after the active NameNode is restarted? + +When this problem occurs, check the ZooKeeper and ZooKeeper FC logs. You can find that the sessions used for the communication between the ZooKeeper server and client (ZKFC) are inconsistent. The session ID of the ZooKeeper server is **0x164cb2b3e4b36ae4**, and the session ID of the ZooKeeper FC is **0x144cb2b3e4b36ae4**. Such inconsistency means that the data interaction between the ZooKeeper server and ZKFC fails. + +Content of the ZooKeeper log is as follows: + +.. code-block:: + + 2015-04-15 21:24:54,257 | INFO | CommitProcessor:22 | Established session 0x164cb2b3e4b36ae4 with negotiated timeout 45000 for client /192.168.0.117:44586 | org.apache.zookeeper.server.ZooKeeperServer.finishSessionInit(ZooKeeperServer.java:623) + 2015-04-15 21:24:54,261 | INFO | NIOServerCxn.Factory:192-168-0-114/192.168.0.114:2181 | Successfully authenticated client: authenticationID=hdfs/hadoop@; authorizationID=hdfs/hadoop@. | org.apache.zookeeper.server.auth.SaslServerCallbackHandler.handleAuthorizeCallback(SaslServerCallbackHandler.java:118) + 2015-04-15 21:24:54,261 | INFO | NIOServerCxn.Factory:192-168-0-114/192.168.0.114:2181 | Setting authorizedID: hdfs/hadoop@ | org.apache.zookeeper.server.auth.SaslServerCallbackHandler.handleAuthorizeCallback(SaslServerCallbackHandler.java:134) + 2015-04-15 21:24:54,261 | INFO | NIOServerCxn.Factory:192-168-0-114/192.168.0.114:2181 | adding SASL authorization for authorizationID: hdfs/hadoop@ | org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1009) + 2015-04-15 21:24:54,262 | INFO | ProcessThread(sid:22 cport:-1): | Got user-level KeeperException when processing sessionid:0x164cb2b3e4b36ae4 type:create cxid:0x3 zxid:0x20009fafc txntype:-1 reqpath:n/a Error Path:/hadoop-ha/hacluster/ActiveStandbyElectorLock Error:KeeperErrorCode = NodeExists for /hadoop-ha/hacluster/ActiveStandbyElectorLock | org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:648) + +Content of the ZKFC log is as follows: + +.. code-block:: + + 2015-04-15 21:24:54,237 | INFO | main-SendThread(192-168-0-114:2181) | Socket connection established to 192-168-0-114/192.168.0.114:2181, initiating session | org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:854) + 2015-04-15 21:24:54,257 | INFO | main-SendThread(192-168-0-114:2181) | Session establishment complete on server 192-168-0-114/192.168.0.114:2181, sessionid = 0x144cb2b3e4b36ae4 , negotiated timeout = 45000 | org.apache.zookeeper.ClientCnxn$SendThread.onConnected(ClientCnxn.java:1259) + 2015-04-15 21:24:54,260 | INFO | main-EventThread | EventThread shut down | org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:512) + 2015-04-15 21:24:54,262 | INFO | main-EventThread | Session connected. | org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:547) + 2015-04-15 21:24:54,264 | INFO | main-EventThread | Successfully authenticated to ZooKeeper using SASL. | org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:573) + +Answer +------ + +- Cause Analysis + + After the active NameNode restarts, the temporary node **/hadoop-ha/hacluster/ActiveStandbyElectorLock** created on ZooKeeper is deleted. After the standby NameNode receives that information that the **/hadoop-ha/hacluster/ActiveStandbyElectorLock** node is deleted, the standby NameNode creates the /**hadoop-ha/hacluster/ActiveStandbyElectorLock** node in ZooKeeper in order to switch to the active NameNode. However, when the standby NameNode connects with ZooKeeper through the client ZKFC, the session ID of ZKFC differs from that of ZooKeeper due to network issues, overload CPU, or overload clusters. In this case, the watcher of the standby NameNode fails to detect that the temporary node has been successfully created, and fails to consider the standby NameNode as the active NameNode. After the original active NameNode restarts, it detects that the **/hadoop-ha/hacluster/ActiveStandbyElectorLock** already exists and becomes the standby NameNode. Therefore, both NameNodes are standby NameNodes. + +- Solution + + You are advised to restart two ZKFCs of HDFS on FusionInsight Manager. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/why_data_in_the_buffer_is_lost_if_a_power_outage_occurs_during_storage_of_small_files.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_data_in_the_buffer_is_lost_if_a_power_outage_occurs_during_storage_of_small_files.rst new file mode 100644 index 0000000..0019b08 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_data_in_the_buffer_is_lost_if_a_power_outage_occurs_during_storage_of_small_files.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_1699.html + +.. _mrs_01_1699: + +Why Data in the Buffer Is Lost If a Power Outage Occurs During Storage of Small Files +===================================================================================== + +Question +-------- + +Why data in the buffer is lost if a power outage occurs during storage of small files? + +Answer +------ + +Because of a power outage, the blocks in the buffer are not written to the disk immediately after the write operation is completed. To enable synchronization of blocks to the disk, set **dfs.datanode.synconclose** to **true** in the **hdfs-site.xml** file. + +By default, **dfs.datanode.synconclose** is set to **false**. This improves the performance but can cause a buffer data loss in the case of a power outage, and therefore, it is recommended that **dfs.datanode.synconclose** be set to **true** even if this may affect the performance. You can determine whether to enable the synchronization function based on your actual situation. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_an_error_occur_during_datanode_capacity_calculation_when_multiple_data.dir_are_configured_in_a_partition.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_an_error_occur_during_datanode_capacity_calculation_when_multiple_data.dir_are_configured_in_a_partition.rst new file mode 100644 index 0000000..4ff3a1f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_an_error_occur_during_datanode_capacity_calculation_when_multiple_data.dir_are_configured_in_a_partition.rst @@ -0,0 +1,51 @@ +:original_name: mrs_01_1697.html + +.. _mrs_01_1697: + +Why Does an Error Occur During DataNode Capacity Calculation When Multiple data.dir Are Configured in a Partition? +================================================================================================================== + +Question +-------- + +DataNode capacity count incorrect if several data.dir configured in one disk partition. + +Answer +------ + +Currently calculation will be done based on the disk like **df** command in linux. Ideally user should not configure multiple directories for same disk which will be huge impact on performance where all data will go to one disk. + +Hence it is always better to configure like below: + +For example: + +if the machine is having disks like following: + +.. code-block:: + + host-4:~ # df -h + Filesystem Size Used Avail Use% Mounted on + /dev/sda1 352G 11G 324G 4% / + udev 190G 252K 190G 1% /dev + tmpfs 190G 72K 190G 1% /dev/shm + /dev/sdb1 2.7T 74G 2.5T 3% /data1 + /dev/sdc1 2.7T 75G 2.5T 3% /data2 + /dev/sdd1 2.7T 73G 2.5T 3% /data + +Suggested way of configuration: + +.. code-block:: + + + dfs.datanode.data.dir + /data1/datadir/,/data2/datadir,/data3/datadir + + +Following is not recommended: + +.. code-block:: + + + dfs.datanode.data.dir + /data1/datadir1/,/data2/datadir1,/data3/datadir1,/data1/datadir2/data1/datadir3,/data2/datadir2,/data2/datadir3,/data3/datadir2,/data3/datadir3 + diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_array_border-crossing_occur_during_fileinputformat_split.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_array_border-crossing_occur_during_fileinputformat_split.rst new file mode 100644 index 0000000..f5ba885 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_array_border-crossing_occur_during_fileinputformat_split.rst @@ -0,0 +1,31 @@ +:original_name: mrs_01_1700.html + +.. _mrs_01_1700: + +Why Does Array Border-crossing Occur During FileInputFormat Split? +================================================================== + +Question +-------- + +When HDFS calls the FileInputFormat getSplit method, the ArrayIndexOutOfBoundsException: 0 appears in the following log: + +.. code-block:: + + java.lang.ArrayIndexOutOfBoundsException: 0 + at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:708) + at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:675) + at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359) + at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:210) + at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) + at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) + at scala.Option.getOrElse(Option.scala:120) + at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) + at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) + +Answer +------ + +The elements of each block correspondent frame are as below: /default/rack0/:,/default/rack0/datanodeip:port. + +The problem is due to a block damage or loss, making the block correspondent machine ip and port become null. Use **hdfs fsck** to check the file blocks health state when this problem occurs, and remove damaged block or restore the missing block to re-computing the task. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_datanode_fail_to_start_when_the_number_of_disks_specified_by_dfs.datanode.data.dir_equals_dfs.datanode.failed.volumes.tolerated.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_datanode_fail_to_start_when_the_number_of_disks_specified_by_dfs.datanode.data.dir_equals_dfs.datanode.failed.volumes.tolerated.rst new file mode 100644 index 0000000..4b1b9db --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_datanode_fail_to_start_when_the_number_of_disks_specified_by_dfs.datanode.data.dir_equals_dfs.datanode.failed.volumes.tolerated.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_1696.html + +.. _mrs_01_1696: + +Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated? +========================================================================================================================================= + +Question +-------- + +If the number of disks specified by **dfs.datanode.data.dir** is equal to the value of **dfs.datanode.failed.volumes.tolerated**, DataNode startup will fail. + +Answer +------ + +By default, the failure of a single disk will cause the HDFS DataNode process to shut down, which results in the NameNode scheduling additional replicas for each block that is present on the DataNode. This causes needless replications of blocks that reside on disks that have not failed. + +To prevent this, you can configure DataNodes to tolerate the failure of dfs.data.dir directories; use the **dfs.datanode.failed.volumes.tolerated** parameter in **hdfs-site.xml.** For example, if the value for this parameter is 3, the DataNode will only shut down after four or more data directories have failed. This value is respected on DataNode startup. + +When we are configuring tolerate volumes which should be always less than the configured volumes or else we can keep this as -1 which is equal to n-1 (where n is number of disks) then DataNode will not be shut down. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_the_distcp_command_fail_in_the_secure_cluster,_causing_an_exception.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_the_distcp_command_fail_in_the_secure_cluster,_causing_an_exception.rst new file mode 100644 index 0000000..fde4f50 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_does_the_distcp_command_fail_in_the_secure_cluster,_causing_an_exception.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_1695.html + +.. _mrs_01_1695: + +Why Does the Distcp Command Fail in the Secure Cluster, Causing an Exception? +============================================================================= + +Question +-------- + +Why distcp command fails in the secure cluster with the following error displayed? + +Client side exception + +.. code-block:: + + Invalid arguments: Unexpected end of file from server + +Server side exception + +.. code-block:: + + javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection? + +Answer +------ + +The preceding error may occur if **webhdfs://** is used in the distcp command. The reason is that the big data cluster uses the HTTPS mechanism, that is, **dfs.http.policy** is set to **HTTPS_ONLY** in **core-site.xml** file. To avoid the error, replace **webhdfs://** with **swebhdfs://** in the file. + +For example: + +**./hadoop distcp** **swebhdfs://IP:PORT/testfile hdfs://IP:PORT/testfile1** diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/why_is_java.net.socketexception_no_buffer_space_available_reported_when_data_is_written_to_hdfs.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_is_java.net.socketexception_no_buffer_space_available_reported_when_data_is_written_to_hdfs.rst new file mode 100644 index 0000000..96545ce --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_is_java.net.socketexception_no_buffer_space_available_reported_when_data_is_written_to_hdfs.rst @@ -0,0 +1,75 @@ +:original_name: mrs_01_1705.html + +.. _mrs_01_1705: + +Why Is "java.net.SocketException: No buffer space available" Reported When Data Is Written to HDFS +================================================================================================== + +Question +-------- + +Why is an "java.net.SocketException: No buffer space available" exception reported when data is written to HDFS? + +This problem occurs when files are written to the HDFS. Check the error logs of the client and DataNode. + +The client logs are as follows: + + +.. figure:: /_static/images/en-us_image_0000001296219452.jpg + :alt: **Figure 1** Client logs + + **Figure 1** Client logs + +DataNode logs are as follows: + +.. code-block:: + + 2017-07-24 20:43:39,269 | ERROR | DataXceiver for client DFSClient_NONMAPREDUCE_996005058_86 + at /192.168.164.155:40214 [Receiving block BP-1287143557-192.168.199.6-1500707719940:blk_1074269754_528941 with io weight 10] | DataNode{data=FSDataset{dirpath='[/srv/BigData/hadoop/data1/dn/current, /srv/BigData/hadoop/data2/dn/current, /srv/BigData/hadoop/data3/dn/current, /srv/BigData/hadoop/data4/dn/current, /srv/BigData/hadoop/data5/dn/current, /srv/BigData/hadoop/data6/dn/current, /srv/BigData/hadoop/data7/dn/current]'}, localName='192-168-164-155:9866', datanodeUuid='a013e29c-4e72-400c-bc7b-bbbf0799604c', xmitsInProgress=0}:Exception transfering block BP-1287143557-192.168.199.6-1500707719940:blk_1074269754_528941 to mirror 192.168.202.99:9866: java.net.SocketException: No buffer space available | DataXceiver.java:870 + 2017-07-24 20:43:39,269 | INFO | DataXceiver for client DFSClient_NONMAPREDUCE_996005058_86 + at /192.168.164.155:40214 [Receiving block BP-1287143557-192.168.199.6-1500707719940:blk_1074269754_528941 with io weight 10] | opWriteBlock BP-1287143557-192.168.199.6-1500707719940:blk_1074269754_528941 received exception java.net.SocketException: No buffer space available | DataXceiver.java:933 + 2017-07-24 20:43:39,270 | ERROR | DataXceiver for client DFSClient_NONMAPREDUCE_996005058_86 + at /192.168.164.155:40214 [Receiving block BP-1287143557-192.168.199.6-1500707719940:blk_1074269754_528941 with io weight 10] | 192-168-164-155:9866:DataXceiver error processing WRITE_BLOCK operation src: /192.168.164.155:40214 dst: /192.168.164.155:9866 | DataXceiver.java:304 java.net.SocketException: No buffer space available + at sun.nio.ch.Net.connect0(Native Method) + at sun.nio.ch.Net.connect(Net.java:454) + at sun.nio.ch.Net.connect(Net.java:446) + at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) + at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) + at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) + at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) + at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:800) + at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:138) + at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) + at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:265) + at java.lang.Thread.run(Thread.java:748) + +Answer +------ + +The preceding problem may be caused by network memory exhaustion. + +You can increase the threshold of the network device based on the actual scenario. + +Example: + +.. code-block:: console + + [root@xxxxx ~]# cat /proc/sys/net/ipv4/neigh/default/gc_thresh* + 128 + 512 + 1024 + [root@xxxxx ~]# echo 512 > /proc/sys/net/ipv4/neigh/default/gc_thresh1 + [root@xxxxx ~]# echo 2048 > /proc/sys/net/ipv4/neigh/default/gc_thresh2 + [root@xxxxx ~]# echo 4096 > /proc/sys/net/ipv4/neigh/default/gc_thresh3 + [root@xxxxx ~]# cat /proc/sys/net/ipv4/neigh/default/gc_thresh* + 512 + 2048 + 4096 + +You can also add the following parameters to the **/etc/sysctl.conf** file. The configuration takes effect even if the host is restarted. + +.. code-block:: + + net.ipv4.neigh.default.gc_thresh1 = 512 + net.ipv4.neigh.default.gc_thresh2 = 2048 + net.ipv4.neigh.default.gc_thresh3 = 4096 diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/why_is_the_storage_type_of_file_copies_disk_when_the_tiered_storage_policy_is_lazy_persist.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_is_the_storage_type_of_file_copies_disk_when_the_tiered_storage_policy_is_lazy_persist.rst new file mode 100644 index 0000000..b4c7692 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_is_the_storage_type_of_file_copies_disk_when_the_tiered_storage_policy_is_lazy_persist.rst @@ -0,0 +1,21 @@ +:original_name: mrs_01_1701.html + +.. _mrs_01_1701: + +Why Is the Storage Type of File Copies DISK When the Tiered Storage Policy Is LAZY_PERSIST? +=========================================================================================== + +Question +-------- + +When the storage policy of the file is set to **LAZY_PERSIST**, the storage type of the first replica should be **RAM_DISK**, and the storage type of other replicas should be **DISK**. + +But why is the storage type of all copies shown as **DISK** actually? + +Answer +------ + +When a user writes into a file whose storage policy is **LAZY_PERSIST**, three replicas are written one by one. The first replica is preferentially written into the DataNode where the client is located. The storage type of all replicas is **DISK** in the following scenarios: + +- If the DataNode where the client is located does not have the RAM disk, the first replica is written into the disk of the DataNode where the client is located, and other replicas are written into the disks of other nodes. +- If the DataNode where the client is located has the RAM disk, and the value of **dfs.datanode.max.locked.memory** is not specified or smaller than the value of **dfs.blocksize**, the first replica is written into the disk of the DataNode where the client is located, and other replicas are written into the disks of other nodes. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/faq/why_mapreduce_tasks_fails_in_the_environment_with_multiple_nameservices.rst b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_mapreduce_tasks_fails_in_the_environment_with_multiple_nameservices.rst new file mode 100644 index 0000000..8796d09 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/faq/why_mapreduce_tasks_fails_in_the_environment_with_multiple_nameservices.rst @@ -0,0 +1,53 @@ +:original_name: mrs_01_1692.html + +.. _mrs_01_1692: + +Why MapReduce Tasks Fails in the Environment with Multiple NameServices? +======================================================================== + +Question +-------- + +Why MapReduce or Yarn tasks using the viewFS function fail to be executed in the environment with multiple NameServices? + +Answer +------ + +When viewFS is used, only directories mounted to viewFS can be accessed. Therefore, the most possible reason is that the configured path is not on the mount point of viewFS. Example: + +.. code-block:: + + + fs.defaultFS + viewfs://ClusterX/ + + + fs.viewfs.mounttable.ClusterX.link./folder1 + hdfs://NS1/folder1 + + + fs.viewfs.mounttable.ClusterX.link./folder2 + hdfs://NS2/folder2 + + +In the MR configuration that depends on the HDFS, the mounted directory needs to be used. + +**Incorrect example**: + +.. code-block:: + + + yarn.app.mapreduce.am.staging-dir + /tmp/hadoop-yarn/staging + + +The root directory (/) cannot be accessed in viewFS. + +**Correct example**: + +.. code-block:: + + + yarn.app.mapreduce.am.staging-dir + /folder1/tmp/hadoop-yarn/staging + diff --git a/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/improving_read_performance_using_client_metadata_cache.rst b/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/improving_read_performance_using_client_metadata_cache.rst new file mode 100644 index 0000000..40e9e24 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/improving_read_performance_using_client_metadata_cache.rst @@ -0,0 +1,63 @@ +:original_name: mrs_01_1688.html + +.. _mrs_01_1688: + +Improving Read Performance Using Client Metadata Cache +====================================================== + +Scenario +-------- + +Improve the HDFS read performance by using the client to cache the metadata for block locations. + +.. note:: + + This function is recommended only for reading files that are not modified frequently. Because the data modification done on the server side by some other client is invisible to the cache client, which may cause the metadata obtained from the cache to be outdated. + +Procedure +--------- + +**Navigation path for setting parameters:** + +On FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **Configurations**, select **All Configurations**, and enter the parameter name in the search box. + +.. table:: **Table 1** Parameter configuration + + +---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +=======================================+=======================================================================================================================================================================================================================================================================+=======================+ + | dfs.client.metadata.cache.enabled | Enables or disables the client to cache the metadata for block locations. Set this parameter to **true** and use it along with the **dfs.client.metadata.cache.pattern** parameter to enable the cache. | false | + +---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | dfs.client.metadata.cache.pattern | Indicates the regular expression pattern of the path of the file to be cached. Only the metadata for block locations of these files is cached until the metadata expires. This parameter is valid only when **dfs.client.metadata.cache.enabled** is set to **true**. | ``-`` | + | | | | + | | Example: **/test.\*** indicates that all files whose paths start with **/test** are read. | | + | | | | + | | .. note:: | | + | | | | + | | - To ensure consistency, configure a specific mode to cache only files that are not frequently modified by other clients. | | + | | | | + | | - The regular expression pattern verifies only the path of the URI, but not the schema and authority in the case of the Fully Qualified path. | | + +---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | dfs.client.metadata.cache.expiry.sec | Indicates the duration for caching metadata. The cache entry becomes invalid after its caching time exceeds this duration. Even metadata that is frequently used during the caching process can become invalid. | 60s | + | | | | + | | Time suffixes **s**/**m**/**h** can be used to indicate second, minute, and hour, respectively. | | + | | | | + | | .. note:: | | + | | | | + | | If this parameter is set to **0s**, the cache function is disabled. | | + +---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | dfs.client.metadata.cache.max.entries | Indicates the maximum number of non-expired data items that can be cached at a time. | 65536 | + +---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + +.. note:: + + Call *DFSClient#clearLocatedBlockCache()* to completely clear the client cache before it expires. + + The sample usage is as follows: + + .. code-block:: + + FileSystem fs = FileSystem.get(conf); + DistributedFileSystem dfs = (DistributedFileSystem) fs; + DFSClient dfsClient = dfs.getClient(); + dfsClient.clearLocatedBlockCache(); diff --git a/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/improving_the_connection_between_the_client_and_namenode_using_current_active_cache.rst b/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/improving_the_connection_between_the_client_and_namenode_using_current_active_cache.rst new file mode 100644 index 0000000..1e7e40f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/improving_the_connection_between_the_client_and_namenode_using_current_active_cache.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_1689.html + +.. _mrs_01_1689: + +Improving the Connection Between the Client and NameNode Using Current Active Cache +=================================================================================== + +Scenario +-------- + +When HDFS is deployed in high availability (HA) mode with multiple NameNode instances, the HDFS client needs to connect to each NameNode in sequence to determine which is the active NameNode and use it for client operations. + +Once the active NameNode is identified, its details can be cached and shared to all clients running on the client host. In this way, each new client first tries to load the details of the active Name Node from the cache and save the RPC call to the standby NameNode, which can help a lot in abnormal scenarios, for example, when the standby NameNode cannot be connected for a long time. + +When a fault occurs and the other NameNode is switched to the active state, the cached details are updated to the information about the current active NameNode. + +Procedure +--------- + +Navigation path for setting parameters: + +On FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **Configurations**, select **All Configurations**, and enter the parameter name in the search box. + +.. table:: **Table 1** Configuration parameters + + +-----------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | Parameter | Description | Default Value | + +=====================================================+============================================================================================================================================================================================================================================================================================================================================================================================================================================================+=========================================================================+ + | dfs.client.failover.proxy.provider.[nameservice ID] | Client Failover proxy provider class which creates the NameNode proxy using the authenticated protocol. If this parameter is set to **org.apache.hadoop.hdfs.server.namenode.ha.BlackListingFailoverProxyProvider**, you can use the NameNode blacklist feature on the HDFS client. If this parameter is set to **org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider**, you can configure the observer NameNode to process read requests. | org.apache.hadoop.hdfs.server.namenode.ha.AdaptiveFailoverProxyProvider | + +-----------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | dfs.client.failover.activeinfo.share.flag | Specifies whether to enable the cache function and share the detailed information about the current active NameNode with other clients. Set it to **true** to enable the cache function. | false | + +-----------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | dfs.client.failover.activeinfo.share.path | Specifies the local directory for storing the shared files created by all clients in the host. If a cache area is to be shared by different users, the directory must have required permissions (for example, creating, reading, and writing cache files in the specified directory). | /tmp | + +-----------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + | dfs.client.failover.activeinfo.share.io.timeout.sec | (Optional) Used to control timeout. The cache file is locked when it is being read or written, and if the file cannot be locked within the specified time, the attempt to read or update the caches will be abandoned. The unit is second. | 5 | + +-----------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+ + +.. note:: + + The cache files created by the HDFS client are reused by other clients, and thus these files will not be deleted from the local system. If this function is disabled, you may need to manually clear the data. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/improving_write_performance.rst b/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/improving_write_performance.rst new file mode 100644 index 0000000..ff656f2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/improving_write_performance.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_1687.html + +.. _mrs_01_1687: + +Improving Write Performance +=========================== + +Scenario +-------- + +Improve the HDFS write performance by modifying the HDFS attributes. + +Procedure +--------- + +Navigation path for setting parameters: + +On FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **Configurations** and select **All Configurations**. Enter a parameter name in the search box. + +.. table:: **Table 1** Parameters for improving HDFS write performance + + +--------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +======================================+=======================================================================================================================================================================================================================================================================================================================================+=======================+ + | dfs.datanode.drop.cache.behind.reads | Specifies whether to enable a DataNode to automatically clear all data in the cache after the data in the cache is transferred to the client. | false | + | | | | + | | If it is set to **true**, the cached data is discarded. The parameter needs to be configured on the DataNode. | | + | | | | + | | You are advised to set it to **true** if data is repeatedly read only a few times, so that the cache can be used by other operations. You are advised to set it to **false** if data is repeatedly read many times to enhance the reading speed. | | + +--------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | dfs.client-write-packet-size | Specifies the size of the client write packet. When the HDFS client writes data to the DataNode, the data will be accumulated until a packet is generated. Then, the packet is transmitted over the network. This parameter specifies the size (unit: byte) of the data packet to be transmitted, which can be specified by each job. | 262144 | + | | | | + | | In the 10-Gigabit network, you can increase the value of this parameter to enhance the transmission throughput. | | + +--------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/index.rst b/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/index.rst new file mode 100644 index 0000000..f4d2a86 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/hdfs_performance_tuning/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_0829.html + +.. _mrs_01_0829: + +HDFS Performance Tuning +======================= + +- :ref:`Improving Write Performance ` +- :ref:`Improving Read Performance Using Client Metadata Cache ` +- :ref:`Improving the Connection Between the Client and NameNode Using Current Active Cache ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + improving_write_performance + improving_read_performance_using_client_metadata_cache + improving_the_connection_between_the_client_and_namenode_using_current_active_cache diff --git a/doc/component-operation-guide-lts/source/using_hdfs/index.rst b/doc/component-operation-guide-lts/source/using_hdfs/index.rst new file mode 100644 index 0000000..33c434b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/index.rst @@ -0,0 +1,70 @@ +:original_name: mrs_01_0790.html + +.. _mrs_01_0790: + +Using HDFS +========== + +- :ref:`Configuring Memory Management ` +- :ref:`Creating an HDFS Role ` +- :ref:`Using the HDFS Client ` +- :ref:`Running the DistCp Command ` +- :ref:`Overview of HDFS File System Directories ` +- :ref:`Changing the DataNode Storage Directory ` +- :ref:`Configuring HDFS Directory Permission ` +- :ref:`Configuring NFS ` +- :ref:`Planning HDFS Capacity ` +- :ref:`Configuring ulimit for HBase and HDFS ` +- :ref:`Balancing DataNode Capacity ` +- :ref:`Configuring Replica Replacement Policy for Heterogeneous Capacity Among DataNodes ` +- :ref:`Configuring the Number of Files in a Single HDFS Directory ` +- :ref:`Configuring the Recycle Bin Mechanism ` +- :ref:`Setting Permissions on Files and Directories ` +- :ref:`Setting the Maximum Lifetime and Renewal Interval of a Token ` +- :ref:`Configuring the Damaged Disk Volume ` +- :ref:`Configuring Encrypted Channels ` +- :ref:`Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable ` +- :ref:`Configuring the NameNode Blacklist ` +- :ref:`Optimizing HDFS NameNode RPC QoS ` +- :ref:`Optimizing HDFS DataNode RPC QoS ` +- :ref:`Configuring Reserved Percentage of Disk Usage on DataNodes ` +- :ref:`Configuring HDFS NodeLabel ` +- :ref:`Configuring HDFS DiskBalancer ` +- :ref:`Performing Concurrent Operations on HDFS Files ` +- :ref:`Introduction to HDFS Logs ` +- :ref:`HDFS Performance Tuning ` +- :ref:`FAQ ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + configuring_memory_management + creating_an_hdfs_role + using_the_hdfs_client + running_the_distcp_command + overview_of_hdfs_file_system_directories + changing_the_datanode_storage_directory + configuring_hdfs_directory_permission + configuring_nfs + planning_hdfs_capacity + configuring_ulimit_for_hbase_and_hdfs + balancing_datanode_capacity + configuring_replica_replacement_policy_for_heterogeneous_capacity_among_datanodes + configuring_the_number_of_files_in_a_single_hdfs_directory + configuring_the_recycle_bin_mechanism + setting_permissions_on_files_and_directories + setting_the_maximum_lifetime_and_renewal_interval_of_a_token + configuring_the_damaged_disk_volume + configuring_encrypted_channels + reducing_the_probability_of_abnormal_client_application_operation_when_the_network_is_not_stable + configuring_the_namenode_blacklist + optimizing_hdfs_namenode_rpc_qos + optimizing_hdfs_datanode_rpc_qos + configuring_reserved_percentage_of_disk_usage_on_datanodes + configuring_hdfs_nodelabel + configuring_hdfs_diskbalancer + performing_concurrent_operations_on_hdfs_files + introduction_to_hdfs_logs + hdfs_performance_tuning/index + faq/index diff --git a/doc/component-operation-guide-lts/source/using_hdfs/introduction_to_hdfs_logs.rst b/doc/component-operation-guide-lts/source/using_hdfs/introduction_to_hdfs_logs.rst new file mode 100644 index 0000000..7a6a726 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/introduction_to_hdfs_logs.rst @@ -0,0 +1,127 @@ +:original_name: mrs_01_0828.html + +.. _mrs_01_0828: + +Introduction to HDFS Logs +========================= + +Log Description +--------------- + +**Log path**: The default path of HDFS logs is **/var/log/Bigdata/hdfs/**\ *Role name*. + +- NameNode: **/var/log/Bigdata/hdfs/nn** (run logs) and **/var/log/Bigdata/audit/hdfs/nn** (audit logs) +- DataNode: **/var/log/Bigdata/hdfs/dn** (run logs) and **/var/log/Bigdata/audit/hdfs/dn** (audit logs) +- ZKFC: **/var/log/Bigdata/hdfs/zkfc** (run logs) and **/var/log/Bigdata/audit/hdfs/zkfc** (audit logs) +- JournalNode: **/var/log/Bigdata/hdfs/jn** (run logs) and **/var/log/Bigdata/audit/hdfs/jn** (audit logs) +- Router: **/var/log/Bigdata/hdfs/router** (run logs) and **/var/log/Bigdata/audit/hdfs/router** (audit logs) +- HttpFS: **/var/log/Bigdata/hdfs/httpfs** (run logs) and **/var/log/Bigdata/audit/hdfs/httpfs** (audit logs) + +**Log archive rule**: The automatic HDFS log compression function is enabled. By default, when the size of logs exceeds 100 MB, logs are automatically compressed into a log file named in the following format: *---.log | HDFS system log, which records most of the logs generated when the HDFS system is running. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hadoop---.out | Log that records the HDFS running environment information. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hadoop.log | Log that records the operation of the Hadoop client. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-period-check.log | Log that records scripts that are executed periodically, including automatic balancing, data migration, and JournalNode data synchronization detection. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | ----gc.log | Garbage collection log file | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | postinstallDetail.log | Work log before the HDFS service startup and after the installation. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-service-check.log | Log that records whether the HDFS service starts successfully. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-set-storage-policy.log | Log that records the HDFS data storage policies. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | cleanupDetail.log | Log that records the cleanup logs about the uninstallation of the HDFS service. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | prestartDetail.log | Log that records cluster operations before the HDFS service startup. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-recover-fsimage.log | Recovery log of the NameNode metadata. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | datanode-disk-check.log | Log that records the disk status check during the cluster installation and use. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-availability-check.log | Log that check whether the HDFS service is available. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-backup-fsimage.log | Backup log of the NameNode metadata. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | startDetail.log | Detailed log that records the HDFS service startup. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-blockplacement.log | Log that records the placement policy of HDFS blocks. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | upgradeDetail.log | Upgrade logs. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-clean-acls-java.log | Log that records the clearing of deleted roles' ACL information by HDFS. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-haCheck.log | Run log that checks whether the NameNode in active or standby state has obtained scripts. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | -jvmpause.log | Log that records JVM pauses during process running. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hadoop--balancer-.log | Run log of HDFS automatic balancing. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hadoop--balancer-.out | Log that records information of the environment where HDFS executes automatic balancing. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-switch-namenode.log | Run log that records the HDFS active/standby switchover. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hdfs-router-admin.log | Run log of the mount table management operation | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Tomcat logs | hadoop-omm-host1.out, httpfs-catalina..log, httpfs-host-manager..log, httpfs-localhost..log, httpfs-manager..log, localhost_access_web_log.log | Tomcat run log | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Audit log | hdfs-audit-.log | Audit log that records the HDFS operations (such as creating, deleting, modifying and querying files). | + | | | | + | | ranger-plugin-audit.log | | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | SecurityAuth.audit | HDFS security audit log. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Log Level +--------- + +:ref:`Table 2 ` lists the log levels supported by HDFS. The log levels include FATAL, ERROR, WARN, INFO, and DEBUG. Logs of which the levels are higher than or equal to the set level will be printed by programs. The higher the log level is set, the fewer the logs are recorded. + +.. _mrs_01_0828__en-us_topic_0000001219149367_t9a69df8da9a84f41bb6fd3e008d7a3b8: + +.. table:: **Table 2** Log levels + + ===== ============================================================== + Level Description + ===== ============================================================== + FATAL Indicates the critical error information about system running. + ERROR Indicates the error information about system running. + WARN Indicates that the current event processing exists exceptions. + INFO Indicates that the system and events are running properly. + DEBUG Indicates the system and system debugging information. + ===== ============================================================== + +To modify log levels, perform the following operations: + +#. Go to the **All Configurations** page of HDFS by referring to :ref:`Modifying Cluster Service Configuration Parameters `. +#. On the left menu bar, select the log menu of the target role. +#. Select a desired log level. +#. Save the configuration. In the displayed dialog box, click **OK** to make the configurations take effect. + + .. note:: + + The configurations take effect immediately without restarting the service. + +Log Formats +----------- + +The following table lists the HDFS log formats. + +.. table:: **Table 3** Log formats + + +-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Format | Example | + +===========+========================================================================================================================================================+========================================================================================================================================================================================================================================================================================================================================================================================+ + | Run log | <*yyyy-MM-dd HH:mm:ss,SSS*>|<*Log level*>|<*Name of the thread that generates the log*>|<*Message in the log*>|<*Location where the log event occurs*> | 2015-01-26 18:43:42,840 \| INFO \| IPC Server handler 40 on 8020 \| Rolling edit logs \| org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1096) | + +-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Audit log | <*yyyy-MM-dd HH:mm:ss,SSS*>|<*Log level*>|<*Name of the thread that generates the log*>|<*Message in the log*>|<*Location where the log event occurs*> | 2015-01-26 18:44:42,607 \| INFO \| IPC Server handler 32 on 8020 \| allowed=true ugi=hbase (auth:SIMPLE) ip=/10.177.112.145 cmd=getfileinfo src=/hbase/WALs/hghoulaslx410,16020,1421743096083/hghoulaslx410%2C16020%2C1421743096083.1422268722795 dst=null perm=null \| org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950) | + +-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/optimizing_hdfs_datanode_rpc_qos.rst b/doc/component-operation-guide-lts/source/using_hdfs/optimizing_hdfs_datanode_rpc_qos.rst new file mode 100644 index 0000000..95aa26a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/optimizing_hdfs_datanode_rpc_qos.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_1673.html + +.. _mrs_01_1673: + +Optimizing HDFS DataNode RPC QoS +================================ + +Scenario +-------- + +When the speed at which the client writes data to the HDFS is greater than the disk bandwidth of the DataNode, the disk bandwidth is fully occupied. As a result, the DataNode does not respond. The client can back off only by canceling or restoring the channel, which results in write failures and unnecessary channel recovery operations. + +Configuration +------------- + +The new configuration parameter **dfs.pipeline.ecn** is introduced. When this configuration is enabled, the DataNode sends a signal from the write channel when the write channel is overloaded. The client may perform backoff based on the blocking signal to prevent the system from being overloaded. This configuration parameter is introduced to make the channel more stable and reduce unnecessary cancellation or recovery operations. After receiving the signal, the client backs off for a period of time (5,000 ms), and then adjusts the backoff time based on the related filter (the maximum backoff time is 50,000 ms). + +Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** DN ECN configuration + + +------------------+----------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Default Value | + +==================+==================================================================================+===============+ + | dfs.pipeline.ecn | After configuration, the DataNode can send blocking notifications to the client. | false | + +------------------+----------------------------------------------------------------------------------+---------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/optimizing_hdfs_namenode_rpc_qos.rst b/doc/component-operation-guide-lts/source/using_hdfs/optimizing_hdfs_namenode_rpc_qos.rst new file mode 100644 index 0000000..864eeb8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/optimizing_hdfs_namenode_rpc_qos.rst @@ -0,0 +1,74 @@ +:original_name: mrs_01_1672.html + +.. _mrs_01_1672: + +Optimizing HDFS NameNode RPC QoS +================================ + +Scenarios +--------- + +Several finished Hadoop clusters are faulty because the NameNode is overloaded and unresponsive. + +Such problem is caused by the initial design of Hadoop: In Hadoop, the NameNode functions as an independent part and in its namespace coordinates various HDFS operations, including obtaining the data block location, listing directories, and creating files. The NameNode receives HDFS operations, regards them as RPC calls, and places them in the FIFO call queue for read threads to process. Requests in FIFO call queue are served first-in first-out. However, users who perform more I/O operations are served more time than those performing fewer I/O operations. In this case, the FIFO is unfair and causes the delay. + + +.. figure:: /_static/images/en-us_image_0000001295740212.png + :alt: **Figure 1** NameNode request processing based on the FIFO call queue + + **Figure 1** NameNode request processing based on the FIFO call queue + +The unfair problem and delaying mentioned before can be improved by replacing the FIFO queue with a new type of queue called FairCallQueue. In this way, FAIR queues assign incoming RPC calls to multiple queues based on the scale of the caller's call. The scheduling module tracks the latest calls and assigns a higher priority to users with a smaller number of calls. + + +.. figure:: /_static/images/en-us_image_0000001296060016.png + :alt: **Figure 2** NameNode request processing based on FAIRCallQueue + + **Figure 2** NameNode request processing based on FAIRCallQueue + +Configuration Description +------------------------- + +- FairCallQueue ensures quality of service (QoS) by internally adjusting the order in which RPCs are invoked. + + This queue consists of the following parts: + + #. DecayRpcScheduler: used to provide priority values from 0 to N (the value 0 indicates the highest priority). + #. Multi-level queues (located in the FairCallQueue): used to ensure that queues are invoked in order of priority. + #. Multi-channel converters (provided with Weighted Round Robin Multiplexer): used to provide logic control for queue selection. + + After the FairCallQueue is configured, the control module determines the sub-queue to which the received invoking is allocated. The current scheduling module is DecayRpcScheduler, which only continuously tracks the priority numbers of various calls and periodically reduces these numbers. + + Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + + .. table:: **Table 1** FairCallQueue parameters + + +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+ + | Parameter | Description | Default Value | + +===============================+==========================================================================================================================================+==========================================+ + | ipc.\ **.callqueue.impl | Specifies the queue implementation class. You need to run the **org.apache.hadoop.ipc.FairCallQueue** command to enable the QoS feature. | java.util.concurrent.LinkedBlockingQueue | + +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+ + +- RPC BackOff + + Backoff is one of the FairCallQueue functions. It requires the client to retry operations (such as creating, deleting, and opening a file) after a period of time. When the backoff occurs, the RCP server throws RetriableException. The FairCallQueue performs backoff in either of the following cases: + + - The queue is full, that is, there are many client calls in the queue. + - The queue response time is longer than the threshold time (specified by the **ipc..decay-scheduler.backoff.responsetime.thresholds** parameter). + + .. table:: **Table 2** RPC Backoff configuration + + +----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+ + | Parameter | Description | Default Value | + +================================================================+==============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+=========================+ + | ipc.\ **.backoff.enable | Specifies whether to enable the backoff. When the current application contains a large number of user callings, the RPC request is blocked if the connection limit of the operating system is not reached. Alternatively, when the RPC or NameNode is heavily loaded, some explicit exceptions can be thrown back to the client based on certain policies. The client can understand these exceptions and perform exponential rollback, which is another implementation of the RetryInvocationHandler class. | false | + +----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+ + | ipc.\ **.decay-scheduler.backoff.responsetime.enable | Indicate whether to enable the backoff based on the average queue response time. | false | + +----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+ + | ipc.\ **.decay-scheduler.backoff.responsetime.thresholds | Configure the response time threshold for each queue. The response time threshold must match the number of priorities (the value of **ipc. .faircallqueue.priority-levels**). Unit: millisecond | 10000,20000,30000,40000 | + +----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+ + +.. note:: + + - **** indicates the RPC port configured on the NameNode. + - The backoff function based on the response time takes effect only when **ipc. .backoff.enable** is set to **true**. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/overview_of_hdfs_file_system_directories.rst b/doc/component-operation-guide-lts/source/using_hdfs/overview_of_hdfs_file_system_directories.rst new file mode 100644 index 0000000..9e964a8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/overview_of_hdfs_file_system_directories.rst @@ -0,0 +1,76 @@ +:original_name: mrs_01_0795.html + +.. _mrs_01_0795: + +Overview of HDFS File System Directories +======================================== + +This section describes the directory structure in HDFS, as shown in the following table. + +.. table:: **Table 1** Directory structure of the HDFS file system + + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | Path | Type | Function | Whether the Directory Can Be Deleted | Deletion Consequence | + +====================================================+=================================+==========================================================================================================================================================================================================================================================================================================================================================================================+======================================+===============================================================================+ + | /tmp/spark2x/sparkhive-scratch | Fixed directory | Stores temporary files of metastore session in Spark2x JDBCServer. | No | Failed to run the task. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/sparkhive-scratch | Fixed directory | Stores temporary files of metastore sessions that are executed in CLI mode using Spark2x CLI. | No | Failed to run the task. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/logs/ | Fixed directory | Stores container log files. | Yes | Container log files cannot be viewed. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/carbon/ | Fixed directory | Stores the abnormal data in this directory if abnormal CarbonData data exists during data import. | Yes | Error data is lost. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/Loader-${*Job name*}_${*MR job ID*} | Temporary directory | Stores the region information about Loader HBase bulkload jobs. The data is automatically deleted after the job running is completed. | No | Failed to run the Loader HBase Bulkload job. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/hadoop-omm/yarn/system/rmstore | Fixed directory | Stores the ResourceManager running information. | Yes | Status information is lost after ResourceManager is restarted. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/archived | Fixed directory | Archives the MR task logs on HDFS. | Yes | MR task logs are lost. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/hadoop-yarn/staging | Fixed directory | Stores the run logs, summary information, and configuration attributes of ApplicationMaster running jobs. | No | Services are running improperly. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/hadoop-yarn/staging/history/done_intermediate | Fixed directory | Stores temporary files in the **/tmp/hadoop-yarn/staging** directory after all tasks are executed. | No | MR task logs are lost. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/hadoop-yarn/staging/history/done | Fixed directory | The periodic scanning thread periodically moves the **done_intermediate** log file to the **done** directory. | No | MR task logs are lost. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/mr-history | Fixed directory | Stores the historical record files that are pre-loaded. | No | Historical MR task log data is lost. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tmp/hive-scratch | Fixed directory | Stores temporary data (such as session information) generated during Hive running. | No | Failed to run the current task. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/{user}/.sparkStaging | Fixed directory | Stores temporary files of the SparkJDBCServer application. | No | Failed to start the executor. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/spark2x/jars | Fixed directory | Stores running dependency packages of the Spark2x executor. | No | Failed to start the executor. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/loader | Fixed directory | Stores dirty data of Loader jobs and data of HBase jobs. | No | Failed to execute the HBase job. Or dirty data is lost. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/loader/etl_dirty_data_dir | | | | | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/loader/etl_hbase_putlist_tmp | | | | | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/loader/etl_hbase_tmp | | | | | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/oozie | Fixed directory | Stores dependent libraries required for Oozie running, which needs to be manually uploaded. | No | Failed to schedule Oozie. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/mapred/hadoop-mapreduce-*3.1.1*.tar.gz | Fixed files | Stores JAR files used by the distributed MR cache. | No | The MR distributed cache function is unavailable. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/hive | Fixed directory | Stores Hive-related data by default, including the depended Spark lib package and default table data storage path. | No | User data is lost. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/omm-bulkload | Temporary directory | Stores HBase batch import tools temporarily. | No | Failed to import HBase tasks in batches. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /user/hbase | Temporary directory | Stores HBase batch import tools temporarily. | No | Failed to import HBase tasks in batches. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /spark2xJobHistory2x | Fixed directory | Stores Spark2.x eventlog data. | No | The History Server service is unavailable, and the task fails to be executed. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /flume | Fixed directory | Stores data collected by Flume from HDFS. | No | Flume runs improperly. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /mr-history/tmp | Fixed directory | Stores logs generated by MapReduce jobs. | Yes | Log information is lost. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /mr-history/done | Fixed directory | Stores logs managed by MR JobHistory Server. | Yes | Log information is lost. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /tenant | Created when a tenant is added. | Directory of a tenant in the HDFS. By default, the system automatically creates a folder in the **/tenant** directory based on the tenant name. For example, the default HDFS storage directory for **ta1** is **tenant/ta1**. When a tenant is created for the first time, the system creates the **/tenant** directory in the HDFS root directory. You can customize the storage path. | No | The tenant account is unavailable. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /apps{1~5}/ | Fixed directory | Stores the Hive package used by WebHCat. | No | Failed to run the WebHCat tasks. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /hbase | Fixed directory | Stores HBase data. | No | HBase user data is lost. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ + | /hbaseFileStream | Fixed directory | Stores HFS files. | No | The HFS file is lost and cannot be restored. | + +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/performing_concurrent_operations_on_hdfs_files.rst b/doc/component-operation-guide-lts/source/using_hdfs/performing_concurrent_operations_on_hdfs_files.rst new file mode 100644 index 0000000..6b17d2d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/performing_concurrent_operations_on_hdfs_files.rst @@ -0,0 +1,101 @@ +:original_name: mrs_01_1684.html + +.. _mrs_01_1684: + +Performing Concurrent Operations on HDFS Files +============================================== + +Scenario +-------- + +Performing this operation can concurrently modify file and directory permissions and access control tools in a cluster. + +Impact on the System +-------------------- + +Performing concurrent file modification operations in a cluster has adverse impacts on the cluster performance. Therefore, you are advised to do so when the cluster is idle. + +Prerequisites +------------- + +- The HDFS client or clients including HDFS has been installed. For example, the installation directory is **/opt/client**. +- Service component users are created by the administrator as required. In security mode, machine-machine users need to download the keytab file. A human-machine user needs to change the password upon the first login. (This operation is not required in normal mode.) + +Procedure +--------- + +#. Log in to the node where the client is installed as the client installation user. + +#. Run the following command to go to the client installation directory: + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If the cluster is in security mode, the user executing the DistCp command must belong to the **supergroup** group and run the following command to perform user authentication. In normal mode, user authentication is not required. + + **kinit** *Component service user* + +#. Increase the JVM size of the client to prevent out of memory (OOM). (32 GB is recommended for 100 million files.) + + .. note:: + + The HDFS client exits abnormally and the error message "java.lang.OutOfMemoryError" is displayed after the HDFS client command is executed. + + This problem occurs because the memory required for running the HDFS client exceeds the preset upper limit (128 MB by default). You can change the memory upper limit of the client by modifying **CLIENT_GC_OPTS** in **\ **/HDFS/component_env**. For example, if you want to set the upper limit to 1 GB, run the following command: + + CLIENT_GC_OPTS="-Xmx1G" + + After the modification, run the following command to make the modification take effect: + + **source** <*Client installation path*>/**/bigdata_env** + +#. Run the concurrent commands shown in the following table. + + +----------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+ + | Command | Description | Function | + +================================================================================================================+===========================================================================================================================+============================================================================+ + | hdfs quickcmds [-t threadsNumber] [-p principal] [-k keytab] -setrep ... | **threadsNumber** indicates the number of concurrent threads. The default value is the number of vCPUs of the local host. | Used to concurrently set the number of copies of all files in a directory. | + | | | | + | | **principal** indicates the Kerberos user. | | + | | | | + | | **keytab** indicates the Keytab file. | | + | | | | + | | **rep** indicates the number of replicas. | | + | | | | + | | **path** indicates the HDFS directory. | | + +----------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+ + | hdfs quickcmds [-t threadsNumber] [-p principal] [-k keytab] -chown [owner][:[group]] ... | **threadsNumber** indicates the number of concurrent threads. The default value is the number of vCPUs of the local host. | Used to concurrently set the owner group of all files in the directory. | + | | | | + | | **principal** indicates the Kerberos user. | | + | | | | + | | **keytab** indicates the Keytab file. | | + | | | | + | | **owner** indicates the owner. | | + | | | | + | | **group** indicates the group to which the user belongs. | | + | | | | + | | **path** indicates the HDFS directory. | | + +----------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+ + | hdfs quickcmds [-t threadsNumber] [-p principal] [-k keytab] -chmod ... | **threadsNumber** indicates the number of concurrent threads. The default value is the number of vCPUs of the local host. | Used to concurrently set permissions for all files in a directory. | + | | | | + | | **principal** indicates the Kerberos user. | | + | | | | + | | **keytab** indicates the Keytab file. | | + | | | | + | | **mode** indicates the permission (for example, 754). | | + | | | | + | | **path** indicates the HDFS directory. | | + +----------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+ + | hdfs quickcmds [-t threadsNumber] [-p principal] [-k keytab] -setfacl [{-b|-k} {-m|-x } ``...]|[--``\ set ...] | **threadsNumber** indicates the number of concurrent threads. The default value is the number of vCPUs of the local host. | Used to concurrently set ACL information for all files in a directory. | + | | | | + | | **principal** indicates the Kerberos user. | | + | | | | + | | **keytab** indicates the Keytab file. | | + | | | | + | | **acl_spec** indicates the ACL list separated by commas (,). | | + | | | | + | | **path** indicates the HDFS directory. | | + +----------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/planning_hdfs_capacity.rst b/doc/component-operation-guide-lts/source/using_hdfs/planning_hdfs_capacity.rst new file mode 100644 index 0000000..a4fef71 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/planning_hdfs_capacity.rst @@ -0,0 +1,127 @@ +:original_name: mrs_01_0799.html + +.. _mrs_01_0799: + +Planning HDFS Capacity +====================== + +In HDFS, DataNode stores user files and directories as blocks, and file objects are generated on the NameNode to map each file, directory, and block on the DataNode. + +The file objects on the NameNode require certain memory capacity. The memory consumption linearly increases as more file objects generated. The number of file objects on the NameNode increases and the objects consume more memory when the files and directories stored on the DataNode increase. In this case, the existing hardware may not meet the service requirement and the cluster is difficult to be scaled out. + +Capacity planning of the HDFS that stores a large number of files is to plan the capacity specifications of the NameNode and DataNode and to set parameters according to the capacity plans. + +Capacity Specifications +----------------------- + +- NameNode capacity specifications + + Each file object on the NameNode corresponds to a file, directory, or block on the DataNode. + + A file uses at least one block. The default size of a block is **134,217,728**, that is, 128 MB, which can be set in the **dfs.blocksize** parameter. By default, a file whose size is less than 128 MB occupies only one block. If the file size is greater than 128 MB, the number of occupied blocks is the file size divided by 128 MB (Number of occupied blocks = File size/128). The directories do not occupy any blocks. + + Based on **dfs.blocksize**, the number of file objects on the NameNode is calculated as follows: + + .. table:: **Table 1** Number of NameNode file objects + + +--------------------------------+---------------------------------------------------------+ + | Size of a File | Number of File Objects | + +================================+=========================================================+ + | < 128 MB | 1 (File) + 1 (Block) = 2 | + +--------------------------------+---------------------------------------------------------+ + | > 128 MB (for example, 128 GB) | 1 (File) + 1,024 (128 GB/128 MB = 1,024 blocks) = 1,025 | + +--------------------------------+---------------------------------------------------------+ + + The maximum number of file objects supported by the active and standby NameNodes is 300,000,000 (equivalent to 150,000,000 small files). **dfs.namenode.max.objects** specifies the number of file objects that can be generated in the system. The default value is **0**, which indicates that the number of generated file objects is not limited. + +- DataNode capacity specifications + + In HDFS, blocks are stored on the DataNode as copies. The default number of copies is **3**, which can be set in the **dfs.replication** parameter. + + The number of blocks stored on all DataNode role instances in the cluster can be calculated based on the following formula: Number of HDFS blocks x 3 Average number of saved blocks = Number of HDFS blocks x 3/Number of DataNodes + + .. table:: **Table 2** DataNode specifications + + +----------------------------------------------------------------------------------------------------------------+----------------+ + | Item | Specifications | + +================================================================================================================+================+ + | Maximum number of block supported by a DataNode instance | 5,000,000 | + +----------------------------------------------------------------------------------------------------------------+----------------+ + | Maximum number of block supported by a disk on a DataNode instance | 500,000 | + +----------------------------------------------------------------------------------------------------------------+----------------+ + | Minimum number of disks required when the number of block supported by a DataNode instance reaches the maximum | 10 | + +----------------------------------------------------------------------------------------------------------------+----------------+ + + .. table:: **Table 3** Number of DataNodes + + ===================== ================================ + Number of HDFS Blocks Minimum Number of DataNode Roles + ===================== ================================ + 10,000,000 10,000,000 \*3/5,000,000 = 6 + 50,000,000 50,000,000 \*3/5,000,000 = 30 + 100,000,000 100,000,000 \*3/5,000,000 = 60 + ===================== ================================ + +Setting Memory Parameters +------------------------- + +- Configuration rules of the NameNode JVM parameter + + Default value of the NameNode JVM parameter **GC_OPTS**: + + -Xms2G -Xmx4G -XX:NewSize=128M -XX:MaxNewSize=256M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=128M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Djdk.tls.ephemeralDHKeySize=2048 + + The number of NameNode files is proportional to the used memory size of the NameNode. When file objects change, you need to change **-Xms2G -Xmx4G -XX:NewSize=128M --XX:MaxNewSize=256M** in the default value. The following table lists the reference values. + + .. table:: **Table 4** NameNode JVM configuration + + +------------------------+------------------------------------------------------+ + | Number of File Objects | Reference Value | + +========================+======================================================+ + | 10,000,000 | -Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M | + +------------------------+------------------------------------------------------+ + | 20,000,000 | -Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G | + +------------------------+------------------------------------------------------+ + | 50,000,000 | -Xms32G -Xmx32G -XX:NewSize=3G -XX:MaxNewSize=3G | + +------------------------+------------------------------------------------------+ + | 100,000,000 | -Xms64G -Xmx64G -XX:NewSize=6G -XX:MaxNewSize=6G | + +------------------------+------------------------------------------------------+ + | 200,000,000 | -Xms96G -Xmx96G -XX:NewSize=9G -XX:MaxNewSize=9G | + +------------------------+------------------------------------------------------+ + | 300,000,000 | -Xms164G -Xmx164G -XX:NewSize=12G -XX:MaxNewSize=12G | + +------------------------+------------------------------------------------------+ + +- Configuration rules of the DataNode JVM parameter + + Default value of the DataNode JVM parameter **GC_OPTS**: + + -Xms2G -Xmx4G -XX:NewSize=128M -XX:MaxNewSize=256M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=128M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Djdk.tls.ephemeralDHKeySize=2048 + + The average number of blocks stored in each DataNode instance in the cluster is: Number of HDFS blocks x 3/Number of DataNodes. If the average number of blocks changes, you need to change **-Xms2G -Xmx4G -XX:NewSize=128M -XX:MaxNewSize=256M** in the default value. The following table lists the reference values. + + .. table:: **Table 5** DataNode JVM configuration + + +-------------------------------------------------+----------------------------------------------------+ + | Average Number of Blocks in a DataNode Instance | Reference Value | + +=================================================+====================================================+ + | 2,000,000 | -Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M | + +-------------------------------------------------+----------------------------------------------------+ + | 5,000,000 | -Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G | + +-------------------------------------------------+----------------------------------------------------+ + + **Xmx** specifies memory which corresponds to the threshold of the number of DataNode blocks, and each GB memory supports a maximum of 500,000 DataNode blocks. Set the memory as required. + +Viewing the HDFS Capacity Status +-------------------------------- + +- NameNode information + + Log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **NameNode(Active)**, and click **Overview** to view information like the number of file objects, files, directories, and blocks in HDFS in **Summary** area. + +- DataNode information + + Log in to FusionInsight Manager, choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **NameNode(Active)**, and click **DataNodes** to view the number of blocks on all DataNodes that report alarms. + +- Alarm information + + Check whether the alarms whose IDs are 14007, 14008, and 14009 are generated and change the alarm thresholds as required. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/reducing_the_probability_of_abnormal_client_application_operation_when_the_network_is_not_stable.rst b/doc/component-operation-guide-lts/source/using_hdfs/reducing_the_probability_of_abnormal_client_application_operation_when_the_network_is_not_stable.rst new file mode 100644 index 0000000..846c2de --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/reducing_the_probability_of_abnormal_client_application_operation_when_the_network_is_not_stable.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_0811.html + +.. _mrs_01_0811: + +Reducing the Probability of Abnormal Client Application Operation When the Network Is Not Stable +================================================================================================ + +Scenario +-------- + +Clients probably encounter running errors when the network is not stable. Users can adjust the following parameter values to improve the running efficiency. + +Configuration Description +------------------------- + +Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** Parameter description + + +--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +============================================+=======================================================================================================================================================================================================+=======================+ + | ha.health-monitor.rpc-timeout.ms | Timeout interval during the NameNode health check performed by ZKFC. Increasing this value can prevent dual active NameNodes and reduce the probability of application running exceptions on clients. | 180,000 | + | | | | + | | Unit: millisecond. Value range: 30,000 to 3,600,000 | | + +--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | ipc.client.connect.max.retries.on.timeouts | Number of retry times when the socket connection between a server and a client times out. | 45 | + | | | | + | | Value range: 1 to 256 | | + +--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | ipc.client.connect.timeout | Timeout interval of the socket connection between a client and a server. Increasing the value of this parameter increases the timeout interval for setting up a connection. | 20,000 | + | | | | + | | Unit: millisecond. Value range: 1 to 3,600,000 | | + +--------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/running_the_distcp_command.rst b/doc/component-operation-guide-lts/source/using_hdfs/running_the_distcp_command.rst new file mode 100644 index 0000000..e982b3e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/running_the_distcp_command.rst @@ -0,0 +1,255 @@ +:original_name: mrs_01_0794.html + +.. _mrs_01_0794: + +Running the DistCp Command +========================== + +Scenario +-------- + +DistCp is a tool used to perform large-amount data replication between clusters or in a cluster. It uses MapReduce tasks to implement distributed copy of a large amount of data. + +Prerequisites +------------- + +- The Yarn client or a client that contains Yarn has been installed. For example, the installation directory is **/opt/client**. +- Service users of each component are created by the system administrator based on service requirements. In security mode, machine-machine users need to download the keytab file. A human-machine user must change the password upon the first login. (Not involved in normal mode) +- To copy data between clusters, you need to enable the inter-cluster data copy function on both clusters. + +Procedure +--------- + +#. Log in to the node where the client is installed. + +#. Run the following command to go to the client installation directory: + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If the cluster is in security mode, the user group to which the user executing the DistCp command belongs must be **supergroup** and the user run the following command to perform user authentication. In normal mode, user authentication is not required. + + **kinit** *Component service user* + +#. Run the DistCp command. The following provides an example: + + **hadoop distcp hdfs://hacluster/source hdfs://hacluster/target** + +Common Usage of DistCp +---------------------- + +#. The following is an example of the commonest usage of DistCp: + + .. code-block:: + + hadoop distcp -numListstatusThreads 40 -update -delete -prbugpaxtq hdfs://cluster1/source hdfs://cluster2/target + + .. note:: + + In the preceding command: + + - **-numListstatusThreads** specifies the number of threads for creating the list of 40 copied files. + + - **-update -delete** specifies that files at the source location and the target location are synchronized, and that files with excessive target locations are deleted. If you need to copy files incrementally, delete **-delete**. + + - If **-prbugpaxtq** and **-update** are used, it indicates that the status information of the copied file is also updated. + + - **hdfs://cluster1/source** indicates the source location, and **hdfs://cluster2/target** indicates the target location. + +#. The following is an example of data copy between clusters: + + .. code-block:: + + hadoop distcp hdfs://cluster1/foo/bar hdfs://cluster2/bar/foo + + .. note:: + + The network between cluster1 and cluster2 must be reachable, and the two clusters must use the same HDFS version or compatible HDFS versions. + +#. The following are multiple examples of data copy in a source directory: + + .. code-block:: + + hadoop distcp hdfs://cluster1/foo/a \ + hdfs://cluster1/foo/b \ + hdfs://cluster2/bar/foo + + The preceding command is used to copy the folders a and b of cluster1 to the **/bar/foo** directory of cluster2. The effect is equivalent to that of the following commands: + + .. code-block:: + + hadoop distcp -f hdfs://cluster1/srclist \ + hdfs://cluster2/bar/foo + + The content of **srclist** is as follows. Before running the DistCp command, upload the **srclist** file to HDFS. + + .. code-block:: + + hdfs://cluster1/foo/a + hdfs://cluster1/foo/b + +#. **-update** indicates that a to-be-copied file does not exist in the target location, or the content of the copied file in the target location is updated; and **-overwrite** is used to overwrite existing files in the target location. + + The following is an example of the difference between no option and any one of the two options (either **update** or **overwrite**) that is added: + + Assume that the structure of a file at the source location is as follows: + + .. code-block:: + + hdfs://cluster1/source/first/1 + hdfs://cluster1/source/first/2 + hdfs://cluster1/source/second/10 + hdfs://cluster1/source/second/20 + + Commands without options are as follows: + + .. code-block:: + + hadoop distcp hdfs://cluster1/source/first hdfs://cluster1/source/second hdfs://cluster2/target + + By default, the preceding command creates the **first** and **second** folders at the target location. Therefore, the copy results are as follows: + + .. code-block:: + + hdfs://cluster2/target/first/1 + hdfs://cluster2/target/first/2 + hdfs://cluster2/target/second/10 + hdfs://cluster2/target/second/20 + + The command with any one of the two options (for example, **update**) is as follows: + + .. code-block:: + + hadoop distcp -update hdfs://cluster1/source/first hdfs://cluster1/source/second hdfs://cluster2/target + + The preceding command copies only the content at the source location to the target location. Therefore, the copy results are as follows: + + .. code-block:: + + hdfs://cluster2/target/1 + hdfs://cluster2/target/2 + hdfs://cluster2/target/10 + hdfs://cluster2/target/20 + + .. note:: + + - If files with the same name exist in multiple source locations, the DistCp command fails. + + - If neither **update** nor **overwrite** is used and the file to be copied already exists in the target location, the file will be skipped. + - When **update** is used, if the file to be copied already exists in the target location but the file content is different, the file content in the target location is updated. + - When **overwrite** is used, if the file to be copied already exists in the target location, the file in the target location is still overwritten. + +#. The following table describes other command options: + + .. table:: **Table 1** Other command options + + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Option | Description | + +===================================+==============================================================================================================================================================================================================================================================================================================+ + | -p[rbugpcaxtq] | When **-update** is also used, the status information of a copied file is updated even if the content of the copied file is not updated. | + | | | + | | **r**: number of copies | + | | | + | | **b**: size of a block | + | | | + | | **u**: user to which the files belong | + | | | + | | **g**: user group to which the user belongs | + | | | + | | **p**: permission | + | | | + | | **c**: check and type | + | | | + | | **a**: access control | + | | | + | | **t**: timestamp | + | | | + | | **q**: quota information | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -i | Failures ignored during copying | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -log | Path of the specified log | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -v | Additional information in the specified log | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -m | Maximum number of concurrent copy tasks that can be executed at the same time | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -numListstatusThreads | Number of threads for constituting the list of copied files. This option increases the running speed of DistCp. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -overwrite | File at the target location that is to be overwritten | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -update | A file at the target location is updated if the size and check of a file at the source location are different from those of the file at the target location. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -append | When **-update** is also used, the content of the file at the source location is added to the file at the target location. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -f | Content of the **** file is used as the file list to be copied. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -filters | A local file is specified whose content contains multiple regular expressions. If the file to be copied matches a regular expression, the file is not copied. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -async | The **distcp** command is run asynchronously. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -atomic {-tmp } | An atomic copy can be performed. You can add a temporary directory during copying. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -bandwidth | The transmission bandwidth of each copy task. Unit: MB/s. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -delete | The files that exist in the target location is deleted but do not exist in the source location. This option is usually used with **-update**, and indicates that files at the source location are synchronized with those at the target location and the redundant files at the target location are deleted. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -diff | The differences between the old and new versions are copied to a file in the old version at the target location. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -skipcrccheck | Whether to skip the cyclic redundancy check (CRC) between the source file and the target file. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -strategy {dynamic|uniformsize} | The policy for copying a task. The default policy is **uniformsize**, that is, each copy task copies the same number of bytes. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -preserveec | Whether to retain the erasure code (EC) policy. | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +FAQs of DistCp +-------------- + +#. When you run the DistCp command, if the content of some copied files is large, you are advised to change the timeout period of MapReduce that executes the copy task. It can be implemented by specifying the **mapreduce.task.timeout** in the DistCp command. For example, run the following command to change the timeout to 30 minutes: + + .. code-block:: + + hadoop distcp -Dmapreduce.task.timeout=1800000 hdfs://cluster1/source hdfs://cluster2/target + + Or, you can also use **filters** to exclude the large files out of the copy process. The command example is as follows: + + .. code-block:: + + hadoop distcp -filters /opt/client/filterfile hdfs://cluster1/source hdfs://cluster2/target + + In the preceding command, *filterfile* indicates a local file, which contains multiple expressions used to match the path of a file that is not copied. The following is an example: + + .. code-block:: + + .*excludeFile1.* + .*excludeFile2.* + +#. If the DistCp command unexpectedly quits, the error message "java.lang.OutOfMemoryError" is displayed. + + This is because the memory required for running the copy command exceeds the preset memory limit (default value: 128 MB). You can change the memory upper limit of the client by modifying **CLIENT_GC_OPTS** in **\ **/HDFS/component_env**. For example, if you want to set the memory upper limit to 1 GB, refer to the following configuration: + + .. code-block:: + + CLIENT_GC_OPTS="-Xmx1G" + + After the modification, run the following command to make the modification take effect: + + **source** {*Client installation path*}\ **/bigdata_env** + +#. When the dynamic policy is used to run the DistCp command, the command exits unexpectedly and the error message "Too many chunks created with splitRatio" is displayed. + + The cause of this problem is that the value of **distcp.dynamic.max.chunks.tolerable** (default value: 20,000) is less than the value of **distcp.dynamic.split.ratio** (default value: 2) multiplied by the number of Maps. This problem occurs when the number of Maps exceeds 10,000. You can use the **-m** parameter to reduce the number of Maps to less than 10,000. + + .. code-block:: + + hadoop distcp -strategy dynamic -m 9500 hdfs://cluster1/source hdfs://cluster2/target + + Alternatively, you can use the **-D** parameter to set **distcp.dynamic.max.chunks.tolerable** to a large value. + + .. code-block:: + + hadoop distcp -Ddistcp.dynamic.max.chunks.tolerable=30000 -strategy dynamic hdfs://cluster1/source hdfs://cluster2/target diff --git a/doc/component-operation-guide-lts/source/using_hdfs/setting_permissions_on_files_and_directories.rst b/doc/component-operation-guide-lts/source/using_hdfs/setting_permissions_on_files_and_directories.rst new file mode 100644 index 0000000..3491f55 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/setting_permissions_on_files_and_directories.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_0807.html + +.. _mrs_01_0807: + +Setting Permissions on Files and Directories +============================================ + +Scenario +-------- + +HDFS allows users to modify the default permissions of files and directories. The default mask provided by the HDFS for creating file and directory permissions is **022**. If you have special requirements for the default permissions, you can set configuration items to change the default permissions. + +Configuration Description +------------------------- + +**Parameter portal:** + +Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** Parameter description + + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Default Value | + +===========================+==================================================================================================================================================================================================+=======================+ + | fs.permissions.umask-mode | This **umask** value (user mask) is used when the user creates files and directories in the HDFS on the clients. This parameter is similar to the file permission mask on Linux. | 022 | + | | | | + | | The parameter value can be in octal or in symbolic, for example, **022** (octal, the same as **u=rwx,g=r-x,o=r-x** in symbolic), or **u=rwx,g=rwx,o=** (symbolic, the same as **007** in octal). | | + | | | | + | | .. note:: | | + | | | | + | | The octal mask is opposite to the actual permission value. You are advised to use the symbol notation to make the description clearer. | | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/setting_the_maximum_lifetime_and_renewal_interval_of_a_token.rst b/doc/component-operation-guide-lts/source/using_hdfs/setting_the_maximum_lifetime_and_renewal_interval_of_a_token.rst new file mode 100644 index 0000000..d9bd179 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/setting_the_maximum_lifetime_and_renewal_interval_of_a_token.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_0808.html + +.. _mrs_01_0808: + +Setting the Maximum Lifetime and Renewal Interval of a Token +============================================================ + +Scenario +-------- + +In security mode, users can flexibly set the maximum token lifetime and token renewal interval in HDFS based on cluster requirements. + +Configuration Description +------------------------- + +**Navigation path for setting parameters:** + +Go to the **All Configurations** page of HDFS and enter a parameter name in the search box by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +.. table:: **Table 1** Parameter description + + +----------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Default Value | + +==============================================+=========================================================================================================================================================+===============+ + | dfs.namenode.delegation.token.max-lifetime | This parameter is a server parameter. It specifies the maximum lifetime of a token. Unit: milliseconds. Value range: 10,000 to 10,000,000,000,000 | 604,800,000 | + +----------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | dfs.namenode.delegation.token.renew-interval | This parameter is a server parameter. It specifies the maximum lifetime to renew a token. Unit: milliseconds. Value range: 10,000 to 10,000,000,000,000 | 86,400,000 | + +----------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ diff --git a/doc/component-operation-guide-lts/source/using_hdfs/using_the_hdfs_client.rst b/doc/component-operation-guide-lts/source/using_hdfs/using_the_hdfs_client.rst new file mode 100644 index 0000000..1379b0d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/using_the_hdfs_client.rst @@ -0,0 +1,96 @@ +:original_name: mrs_01_1663.html + +.. _mrs_01_1663: + +Using the HDFS Client +===================== + +Scenario +-------- + +This section describes how to use the HDFS client in an O&M scenario or service scenario. + +Prerequisites +------------- + +- The client has been installed. + + For example, the installation directory is **/opt/hadoopclient**. The client directory in the following operations is only an example. Change it to the actual installation directory. + +- Service component users are created by the administrator as required. In security mode, machine-machine users need to download the keytab file. A human-machine user needs to change the password upon the first login. (This operation is not required in normal mode.) + + +Using the HDFS Client +--------------------- + +#. Log in to the node where the client is installed as the client installation user. + +#. Run the following command to go to the client installation directory: + + **cd /opt/hadoopclient** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If the cluster is in security mode, run the following command to authenticate the user. In normal mode, user authentication is not required. + + **kinit** *Component service user* + +#. Run the HDFS Shell command. Example: + + **hdfs dfs -ls /** + +Common HDFS Client Commands +--------------------------- + +The following table lists common HDFS client commands. + +For more commands, see https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CommandsManual.html#User_Commands. + +.. table:: **Table 1** Common HDFS client commands + + +--------------------------------------------------------------------------------+-------------------------------------------------------------+------------------------------------------------------------------------------------------+ + | Command | Description | Example | + +================================================================================+=============================================================+==========================================================================================+ + | **hdfs dfs -mkdir** *Folder name* | Used to create a folder. | **hdfs dfs -mkdir /tmp/mydir** | + +--------------------------------------------------------------------------------+-------------------------------------------------------------+------------------------------------------------------------------------------------------+ + | **hdfs dfs -ls** *Folder name* | Used to view a folder. | **hdfs dfs -ls /tmp** | + +--------------------------------------------------------------------------------+-------------------------------------------------------------+------------------------------------------------------------------------------------------+ + | **hdfs dfs -put** *Local file on the client node* | Used to upload a local file to a specified HDFS path. | **hdfs dfs -put /opt/test.txt /tmp** | + | | | | + | | | Upload the **/opt/tests.txt** file on the client node to the **/tmp** directory of HDFS. | + +--------------------------------------------------------------------------------+-------------------------------------------------------------+------------------------------------------------------------------------------------------+ + | **hdfs dfs -get** *Specified file on HDFS* *Specified path on the client node* | Used to download the HDFS file to the specified local path. | **hdfs dfs -get /tmp/test.txt /opt/** | + | | | | + | | | Download the **/tmp/test.txt** file on HDFS to the **/opt** path on the client node. | + +--------------------------------------------------------------------------------+-------------------------------------------------------------+------------------------------------------------------------------------------------------+ + | **hdfs dfs -rm -r -f** *Specified folder on HDFS* | Used to delete a folder. | **hdfs dfs -rm -r -f /tmp/mydir** | + +--------------------------------------------------------------------------------+-------------------------------------------------------------+------------------------------------------------------------------------------------------+ + +Client-related FAQs +------------------- + +#. What do I do when the HDFS client exits abnormally and error message "java.lang.OutOfMemoryError" is displayed after the HDFS client command is running? + + This problem occurs because the memory required for running the HDFS client exceeds the preset upper limit (128 MB by default). You can change the memory upper limit of the client by modifying **CLIENT_GC_OPTS** in **\ **/HDFS/component_env**. For example, if you want to set the upper limit to 1 GB, run the following command: + + .. code-block:: + + CLIENT_GC_OPTS="-Xmx1G" + + After the modification, run the following command to make the modification take effect: + + **source** <*Client installation path*>/**/bigdata_env** + +#. How can i set the log level when the HDFS client is running? + + By default, the logs generated during the running of the HDFS client are printed to the console. The default log level is INFO. To enable the DEBUG log level for fault locating, run the following command to export an environment variable: + + **export HADOOP_ROOT_LOGGER=DEBUG,console** + + Then run the HDFS Shell command to generate the DEBUG logs. + + If you want to print INFO logs again, run the following command: + + **export HADOOP_ROOT_LOGGER=INFO,console** diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_do_if_coordinators_and_workers_cannot_be_started_on_the_new_node.rst b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_do_if_coordinators_and_workers_cannot_be_started_on_the_new_node.rst new file mode 100644 index 0000000..f3b7285 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_do_if_coordinators_and_workers_cannot_be_started_on_the_new_node.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_24050.html + +.. _mrs_01_24050: + +How Do I Do If Coordinators and Workers Cannot Be Started on the New Node? +========================================================================== + +Question +-------- + +A new host is added to the cluster in security mode, the NodeManager instance is added, and the parameters of the HetuEngine compute instance are adjusted. After the HetuEngine compute instance is restarted, the Coordinators and Workers of the HetuEngine compute instance cannot be started on the new host. + +Answer +------ + +In security mode, the internal communication between nodes of HetuEngine compute instances uses the Simple and Protected GSS API Negotiation Mechanism (SPNEGO) authentication. The JKS certificate is required. It contains information about all nodes in the compute instance. The information about the newly added nodes is not contained in the original certificate. You need to restart all HetuEngine HSBroker instances to generate a new certificate, and then restart the HetuEngine compute instance. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_handle_data_source_loss.rst b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_handle_data_source_loss.rst new file mode 100644 index 0000000..4f66921 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_handle_data_source_loss.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_2323.html + +.. _mrs_01_2323: + +How Do I Handle Data Source Loss? +================================= + +Question +-------- + +Why is the data source lost when I log in to the client to check the data source connected to the HSConsole page? + +Answer +------ + +The possible cause of data source loss is that the DBService active/standby switchover occurs or the database connection usage exceeds the threshold. You can log in to FusionInsight Manager to view the alarm information and clear the DBService alarm based on the alarm guide. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_handle_hetuengine_alarms.rst b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_handle_hetuengine_alarms.rst new file mode 100644 index 0000000..8f052f6 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_handle_hetuengine_alarms.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_2329.html + +.. _mrs_01_2329: + +How Do I Handle HetuEngine Alarms? +================================== + +Question +-------- + +Log in to FusionInsight Manager and HetuEngine alarms are generated for the cluster. + +Answer +------ + +Log in to FusionInsight Manager, go to the O&M page, and view alarm details. You can click the drop-down button of an alarm to view the alarm details. For most alarms, you can locate and handle them based on the alarm causes in the alarm details. You can also view the online help information about an alarm by using the alarm help function. If the alarm is not automatically cleared, you can manually clear it after troubleshooting. + +#. Log in to FusionInsight Manager. +#. Choose **O&M** > **Alarm** > **Alarms**. +#. View alarm details in the alarm list. +#. Locate the row that contains the alarm and click **View Help** in the **Operation** column to obtain more help information. +#. Locate the fault based on the possible causes provided in the online help and clear the HetuEngine alarms based on the handling procedure provided in the online help. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_perform_operations_after_the_domain_name_is_changed.rst b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_perform_operations_after_the_domain_name_is_changed.rst new file mode 100644 index 0000000..39c25a7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/how_do_i_perform_operations_after_the_domain_name_is_changed.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_2321.html + +.. _mrs_01_2321: + +How Do I Perform Operations After the Domain Name Is Changed? +============================================================= + +Question +-------- + +After the domain name is changed, the installed client configuration and data source configuration become invalid, and the created cluster is unavailable. When data sources in different domains are interconnected, HetuEngine automatically combines the **krb5.conf** file. After the domain name is changed, the domain name for Kerberos authentication changes. As a result, the information about the interconnected data source becomes invalid. + +Answer +------ + +- You need to reinstall the cluster client. + +- Delete the old data source information on HSConsole by referring to :ref:`Managing Data Sources ` and configure the data source information on the HSConsole again by referring to :ref:`Configuring Data Sources `. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/index.rst new file mode 100644 index 0000000..1a922dc --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/index.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_1747.html + +.. _mrs_01_1747: + +Common Issues About HetuEngine +============================== + +- :ref:`How Do I Perform Operations After the Domain Name Is Changed? ` +- :ref:`What Do I Do If Starting a Cluster on the Client Times Out? ` +- :ref:`How Do I Handle Data Source Loss? ` +- :ref:`How Do I Handle HetuEngine Alarms? ` +- :ref:`How Do I Do If Coordinators and Workers Cannot Be Started on the New Node? ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + how_do_i_perform_operations_after_the_domain_name_is_changed + what_do_i_do_if_starting_a_cluster_on_the_client_times_out + how_do_i_handle_data_source_loss + how_do_i_handle_hetuengine_alarms + how_do_i_do_if_coordinators_and_workers_cannot_be_started_on_the_new_node diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/what_do_i_do_if_starting_a_cluster_on_the_client_times_out.rst b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/what_do_i_do_if_starting_a_cluster_on_the_client_times_out.rst new file mode 100644 index 0000000..c5bfbae --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/common_issues_about_hetuengine/what_do_i_do_if_starting_a_cluster_on_the_client_times_out.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_2322.html + +.. _mrs_01_2322: + +What Do I Do If Starting a Cluster on the Client Times Out? +=========================================================== + +Question +-------- + +If the cluster startup on the client takes a long time, the waiting times out and the waiting page exits. + +Answer +------ + +If the cluster startup times out, the waiting page automatically exits. You can log in to the cluster again until the cluster is successfully started. Additionally, you can also view the cluster running status on the HSConsole page. When the cluster is in the running state, log in to the cluster again. If the cluster fails to be started, you can locate the fault based on the startup logs. For details, see :ref:`Log Description `. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/before_you_start.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/before_you_start.rst new file mode 100644 index 0000000..6c785d2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/before_you_start.rst @@ -0,0 +1,57 @@ +:original_name: mrs_01_2315.html + +.. _mrs_01_2315: + +Before You Start +================ + +HetuEngine supports quick joint query of multiple data sources and GUI-based data source configuration and management. You can quickly add a data source on the HSConsole page. + +:ref:`Table 1 ` lists the data sources supported by HetuEngine of the current version. + +.. _mrs_01_2315__en-us_topic_0000001173789582_table667372634317: + +.. table:: **Table 1** List for connecting HetuEngine to data sources + + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | HetuEngine Mode | Data Source | Data Source Mode | Supported Data Source Version | + +=================+===============+==================+===============================================================+ + | Security mode | Hive | Security mode | MRS 3.x, FusionInsight 6.5.1, and Hive in the current cluster | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | HBase | | MRS 3.x and FusionInsight 6.5.1 | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | Elasticsearch | | Elasticsearch of the current cluster | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | HetuEngine | | MRS 3.x | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | GaussDB | | GaussDB 200 and GaussDB A 8.0.0 | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | Hudi | | MRS 3.1.1 or later | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | ClickHouse | | MRS 3.1.1 or later | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | Normal mode | Hive | Normal mode | MRS 3.x, FusionInsight 6.5.1, and Hive in the current cluster | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | HBase | | MRS 3.x and FusionInsight 6.5.1 | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | Elasticsearch | | Elasticsearch of the current cluster | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | Hudi | | MRS 3.1.1 or later | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | ClickHouse | | MRS 3.1.1 or later | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + | | GaussDB | Security mode | GaussDB 200 and GaussDB A 8.0.0 | + +-----------------+---------------+------------------+---------------------------------------------------------------+ + +Operations such as adding, configuring, and deleting a HetuEngine data source takes effect dynamically without restarting the cluster. + +A configured data source takes effect dynamically and you cannot disable this function. By default, the interval for a data source to dynamically take effect is 60 seconds. You can change the interval to a desired one by changing the value of **catalog.scanner-interval** in **coordinator.config.properties** and **worker.config.properties** by referring to :ref:`3.e ` in :ref:`Creating HetuEngine Compute Instances `. See the following example. + +.. code-block:: + + catalog.scanner-interval =120s + +.. note:: + + - The domain name of the data source cluster must be different from that of the HetuEngine cluster. In addition, two data sources with the same domain name cannot be connected to at the same time. + - The data source cluster and the HetuEngine cluster can communicate with each other. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_clickhouse_data_source.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_clickhouse_data_source.rst new file mode 100644 index 0000000..281bb39 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_clickhouse_data_source.rst @@ -0,0 +1,189 @@ +:original_name: mrs_01_24146.html + +.. _mrs_01_24146: + +Configuring a ClickHouse Data Source +==================================== + +Scenarios +--------- + +- Currently, HetuEngine supports the interconnection with the ClickHouse data source in the cluster of MRS 3.1.1 or later. +- The HetuEngine cluster in security mode supports the interconnection with the ClickHouse data source in the cluster of MRS 3.1.1 or later in security mode. +- The HetuEngine cluster in normal mode supports the interconnection with the ClickHouse data source in the cluster of MRS 3.1.1 or later in normal mode. +- In the ClickHouse data source, tables with the same name but in different cases, for example, **cktable** (lowercase), **CKTABLE** (uppercase), and **CKtable** (uppercase and lowercase), cannot co-exist in the same schema or database. Otherwise, tables in the schema or database cannot be used by HetuEngine. + +Prerequisites +------------- + +You have created a HetuEngine administrator by referring to :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. +#. In the **Basic Information** area on the **Dashboard** tab page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Choose **Data Source**. +#. Click **Add Data Source**. On the **Add Data Source** page that is displayed, configure parameters. + + a. Configure parameters in the **Basic Configuration** area. For details about the parameters, see :ref:`Table 1 `. + + .. _mrs_01_24146__en-us_topic_0000001219350589_table1019212591518: + + .. table:: **Table 1** Basic Information + + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=======================+================================================================================================================+=======================+ + | Name | Name of the data source to be connected. | clickhouse_1 | + | | | | + | | The value can contain only letters, digits, and underscores (_) and must start with a letter. | | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Data Source Type | Type of the data source to be connected. Choose **JDBC** > **ClickHouse**. | ClickHouse | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Description | Description of the data source. | ``-`` | + | | | | + | | The value can contain only letters, digits, commas (,), periods (.), underscores (_), spaces, and line breaks. | | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + + b. Configure parameters in the **ClickHouse Configuration** area. For details, see :ref:`Table 2 `. + + .. _mrs_01_24146__en-us_topic_0000001219350589_table17193559125115: + + .. table:: **Table 2** ClickHouse Configuration + + +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +==================================+===============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+=================================================================================================================================================+ + | Driver | The default value is **clickhouse**. | clickhouse | + +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | JDBC URL | JDBC URL of the ClickHouse data source. | **jdbc:clickhouse://10.162.156.243:21426**, **jdbc:clickhouse://10.162.156.243:21425**, or **jdbc:clickhouse://[fec0::d916:8:5:164:200]:21426** | + | | | | + | | - If the ClickHouse data source uses IPv4, the format is **jdbc:clickhouse://:**. | | + | | - If the ClickHouse data source uses IPv6, the format is **jdbc:clickhouse://[]:**. | | + | | | | + | | - To obtain the value of ****, log in to Manager of the cluster where the ClickHouse data source is located, choose **Cluster** > **Services** > **ClickHouse** > **Instance**, and view the ClickHouseBalancer service IP address. | | + | | - To obtain the value of ****, log in to Manager of the cluster where the ClickHouse data source is located, and choose **Cluster** > **Services** > **ClickHouse** > **Configurations** > **All Configurations**. If the ClickHouse data source is in security mode, check the HTTPS port number of the ClickHouseBalancer instance, that is, the value of **lb_https_port**. If the ClickHouse data source is in normal mode, check the HTTP port number of the ClickHouseBalancer instance, that is, the value of **lb_http_port**. | | + +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | Username | Username used for connecting to the ClickHouse data source. | Change the value based on the username being connected with the data source. | + +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | Password | User password used for connecting to the ClickHouse data source. | Change the value based on the user password for connecting to the data source. | + +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + | Case-sensitive Table/Schema Name | Whether to support case-sensitive names or schemas of the data source. | ``-`` | + | | | | + | | HetuEngine supports case-sensitive names or schemas of the data source. | | + | | | | + | | - **No**: If multiple table names exist in the same schema of a data source, for example, **cktable** (lowercase), **CKTABLE** (uppercase), and **CKtable** (lowercase and uppercase), only **cktable** (lowercase) can be used by HetuEngine. | | + | | - **Yes**: Only one table name can exist in the same schema of the data source, for example, **cktable** (lowercase), **CKTABLE** (uppercase), or **CKtable** (lowercase and uppercase). Otherwise, all tables in the schema cannot be used by HetuEngine. | | + +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ + + c. (Optional) Customize the configuration. + + You can click **Add** to add custom configuration parameters. Configure custom parameters of the ClickHouse data source. For details, see :ref:`Table 3 `. + + .. _mrs_01_24146__en-us_topic_0000001219350589_table188672024123816: + + .. table:: **Table 3** Custom parameters of the ClickHouse data source + + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +==========================================+================================================================================================================+=======================+ + | use-connection-pool | Whether to use the JDBC connection pool. | true | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.connection.pool.maxTotal | Maximum number of connections in the JDBC connection pool. | 8 | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.connection.pool.maxIdle | Maximum number of idle connections in the JDBC connection pool. | 8 | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.connection.pool.minIdle | Minimum number of idle connections in the JDBC connection pool. | 0 | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.connection.pool.testOnBorrow | Whether to check the connection validity when using a connection obtained from the JDBC connection pool. | false | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.pushdown-enabled | Whether to enable the pushdown function. | true | + | | | | + | | Default value: **true** | | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.pushdown-module | Pushdown type. | ``-`` | + | | | | + | | - **DEFAULT**: No operator is pushed down. | | + | | - **BASE_PUSHDOWN**: Only operators such as Filter, Aggregation, Limit, TopN, and Projection are pushed down. | | + | | - **FULL_PUSHDOWN**: All supported operators are pushed down. | | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | clickhouse.map-string-as-varchar | Whether to convert the ClickHouse data source of the String and FixedString types to the Varchar type. | true | + | | | | + | | Default value: **true** | | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | clickhouse.socket-timeout | Timeout interval for connecting to the ClickHouse data source. | 120000 | + | | | | + | | Unit: millisecond | | + | | | | + | | Default value: **120000** | | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | case-insensitive-name-matching.cache-ttl | Timeout interval for caching case-sensitive names of schemas or tables of the data sources. | 1 | + | | | | + | | Unit: minute | | + | | | | + | | Default value: **1** | | + +------------------------------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + + You can click **Delete** to delete custom configuration parameters. + + d. Click **OK**. + +Operation Guide +--------------- + +- :ref:`Table 4 ` lists the ClickHouse data types supported by HetuEngine. + + .. _mrs_01_24146__en-us_topic_0000001219350589_table14995183764415: + + .. table:: **Table 4** ClickHouse data types supported by HetuEngine + + +-----------------------------------------------+----------------------+---------------------------+ + | Name | ClickHouse Data Type | | + +===============================================+======================+===========================+ + | ClickHouse data types supported by HetuEngine | UInt8 | Decimal128(S) | + +-----------------------------------------------+----------------------+---------------------------+ + | | UInt16 | Boolean | + +-----------------------------------------------+----------------------+---------------------------+ + | | UInt32 | String | + +-----------------------------------------------+----------------------+---------------------------+ + | | UInt64 | Fixedstring(N) | + +-----------------------------------------------+----------------------+---------------------------+ + | | Int8 | UUID | + +-----------------------------------------------+----------------------+---------------------------+ + | | Int16 | Date | + +-----------------------------------------------+----------------------+---------------------------+ + | | Int32 | DateTime([timezone]) | + +-----------------------------------------------+----------------------+---------------------------+ + | | Int64 | Enum | + +-----------------------------------------------+----------------------+---------------------------+ + | | Float32 | LowCardinality(data_type) | + +-----------------------------------------------+----------------------+---------------------------+ + | | Float64 | Nullable(typename) | + +-----------------------------------------------+----------------------+---------------------------+ + | | Decimal(P, S) | IPv4 | + +-----------------------------------------------+----------------------+---------------------------+ + | | Decimal32(S) | IPv6 | + +-----------------------------------------------+----------------------+---------------------------+ + | | Decimal64(S) | ``-`` | + +-----------------------------------------------+----------------------+---------------------------+ + +- :ref:`Table 5 ` lists the tables and views that support the interconnection between HetuEngine and ClickHouse. + + .. _mrs_01_24146__en-us_topic_0000001219350589_table7424121341219: + + .. table:: **Table 5** Supported tables and views + + +---------------------------------------------------------------------------+-------------------------------------------------+ + | Name | Supported Table and View | + +===========================================================================+=================================================+ + | Tables that support the interconnection between HetuEngine and ClickHouse | Local table (MergeTree) | + +---------------------------------------------------------------------------+-------------------------------------------------+ + | | Replicated table (ReplicatedReplacingMergeTree) | + +---------------------------------------------------------------------------+-------------------------------------------------+ + | | Distributed table | + +---------------------------------------------------------------------------+-------------------------------------------------+ + | Views that support the interconnection between HetuEngine and ClickHouse | Normal view | + +---------------------------------------------------------------------------+-------------------------------------------------+ + | | Materialized view | + +---------------------------------------------------------------------------+-------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_gaussdb_data_source.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_gaussdb_data_source.rst new file mode 100644 index 0000000..b68b0ba --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_gaussdb_data_source.rst @@ -0,0 +1,174 @@ +:original_name: mrs_01_2351.html + +.. _mrs_01_2351: + +Configuring a GaussDB Data Source +================================= + +Scenario +-------- + +This section describes how to add a GaussDB JDBC data source on the HSConsole page. + +Prerequisites +------------- + +- The domain name of the cluster where the data source is located must be different from the HetuEngine cluster domain name. +- The cluster where the data source is located and the HetuEngine cluster nodes can communicate with each other. +- A HetuEngine compute instance has been created. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Choose **Data Source**. +#. Click **Add Data Source**. Configure parameters on the **Add Data Source** page. + + a. Configure **Basic Information**. For details, see :ref:`Table 1 `. + + .. _mrs_01_2351__en-us_topic_0000001219149679_table16806135773018: + + .. table:: **Table 1** Basic Information + + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=======================+================================================================================================================+=======================+ + | Name | Name of the data source to be connected. | gaussdb_1 | + | | | | + | | The value can contain only letters, digits, and underscores (_) and must start with a letter. | | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Data Source Type | Type of the data source to be connected. Choose **JDBC** > **GAUSSDB-A**. | GAUSSDB-A | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Description | Description of the data source. | ``-`` | + | | | | + | | The value can contain only letters, digits, commas (,), periods (.), underscores (_), spaces, and line breaks. | | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + + b. Configure parameters in the **GAUSSDB-A Configuration** area. For details, see :ref:`Table 2 `. + + .. _mrs_01_2351__en-us_topic_0000001219149679_table102190549122: + + .. table:: **Table 2** GAUSSDB-A Configuration + + +-----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +===============================================+===========================================================================================================================================================================================================================================================================================================================================+========================================================================================+ + | Driver | The default value is **gaussdba**. | gaussdba | + +-----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ + | JDBC URL | JDBC URL for connecting to GaussDB A. The format is as follows: | jdbc:postgresql://10.0.136.1:25308/postgres | + | | | | + | | **jdbc:postgresql://**\ *CN service IP address*\ **:**\ *Port number*\ **/**\ *Database name* | | + +-----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ + | Username | Username for connecting to the GaussDB data source. | Change the value based on the username being connected with the data source. | + +-----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ + | Password | Password for connecting to the GaussDB data source. | Change the value based on the username and password for connecting to the data source. | + +-----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ + | GaussDB User Information Configuration | Configure multiple GaussDB usernames and passwords in the format of **dataSourceUser** and **password** key-value pairs. | ``-`` | + +-----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ + | HetuEngine-GaussDB User Mapping Configuration | Multiple HetuEngine accounts are configured in the format of **hetuUser** and **dataSourceUser** key-value pairs, corresponding to one of the users configured in the **GaussDB User Information Configuration** area. When different HetuEngine users are used to access GaussDB, different GaussDB usernames and passwords can be used. | ``-`` | + +-----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ + + c. Configure the GaussDB data source user information. For details, see :ref:`Table 3 `. + + **GaussDB User Information Configuration** and **HetuEngine-GaussDB User Mapping Configuration** must be used together. When HetuEngine is connected to the GaussDB data source, HetuEngine users can have the same permissions of the mapped GaussDB data source user through mapping. Multiple HetuEngine users can correspond to one GaussDB user. + + .. _mrs_01_2351__en-us_topic_0000001219149679_table171526153351: + + .. table:: **Table 3** GaussDB User Information Configuration + + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+=========================================================================================================================================================================================================================+==========================================================================================================+ + | Data Source User | Data Source User | If the data source user is set to **gaussuser1**, a HetuEngine user mapped to **gaussuser1** must exist. | + | | | | + | | The value can contain only letters, digits, underscores (_), hyphens (-), and periods (.), and must start with a letter or underscore (_). The minimum length is 2 characters and the maximum length is 100 characters. | For example, create **hetuuser1** and map it to **gaussuser1**. | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + | Password | User authentication password of the corresponding data source. | ``-`` | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------+ + + d. Configure the HetuEngine-GaussDB user mapping. For details, see :ref:`Table 4 `. + + .. _mrs_01_2351__en-us_topic_0000001219149679_table16160161519353: + + .. table:: **Table 4** HetuEngine-GaussDB User Mapping Configuration + + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+=========================================================================================================================================================================================================================+==============================================================================================================================+ + | HetuEngine User | HetuEngine username. | hetuuser1 | + | | | | + | | The value can contain only letters, digits, underscores (_), hyphens (-), and periods (.), and must start with a letter or underscore (_). The minimum length is 2 characters and the maximum length is 100 characters. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------+ + | Data Source User | Data source user. | **gaussuser1** (data source user configured in :ref:`Table 3 `) | + | | | | + | | The value can contain only letters, digits, underscores (_), hyphens (-), and periods (.), and must start with a letter or underscore (_). The minimum length is 2 characters and the maximum length is 100 characters. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------+ + + e. Modify custom configurations. + + - You can click **Add** to add custom configuration parameters. Configure custom parameters of the GaussDB data source. For details, see :ref:`Table 5 `. + + .. _mrs_01_2351__en-us_topic_0000001219149679_table132941558135018: + + .. table:: **Table 5** Custom parameters of the GaussDB data source + + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +============================================+=======================================================================================================================================================================================================================================================================+=======================+ + | use-connection-pool | Whether to use the JDBC connection pool. | true | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.connection.pool.maxTotal | Maximum number of connections in the JDBC connection pool. | 8 | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.connection.pool.maxIdle | Maximum number of idle connections in the JDBC connection pool. | 8 | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.connection.pool.minIdle | Minimum number of idle connections in the JDBC connection pool. | 0 | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.pushdown-enabled | **true**: SQL statements can be pushed down to the data source for execution. | true | + | | | | + | | **false**: SQL statements are not pushed down to the data source for execution. As a result, more network and computing resources are consumed. | | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | jdbc.pushdown-module | The push-down function should be enabled in advance. | DEFAULT | + | | | | + | | - **DEFAULT**: No operator is pushed down. | | + | | - **BASE_PUSHDOWN**: Only operators such as Filter, Aggregation, Limit, TopN, and Projection are pushed down. | | + | | - **FULL_PUSHDOWN**: All supported operators are pushed down. | | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | source-encoding | GaussDB data source encoding mode. | UTF-8 | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | multiple-cnn-enabled | Whether to use the GaussDB multi-CN configuration. To use it, ensure that the JDBC connection pool is disabled and the JDBC URL format is as follows: jdbc:postgresql://host:port/database,jdbc:postgresql://host:port/database,jdbc:postgresql://host:port/database. | false | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | parallel-read-enabled | Whether to use the parallel data read function. | false | + | | | | + | | If the parallel data read function is enabled, the actual number of splits is determined based on the node distribution and the value of **max-splits**. | | + | | | | + | | Multiple connections to the data source will be created for parallel read operations. The dependent data source should support the load. | | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | split-type | Type of the parallel data read function. | NODE | + | | | | + | | - **NODE**: The degree of parallelism (DOP) is categorized based on the GaussDB data source DataNodes. | | + | | - **PARTITION**: The DOP is categorized based on table partitions. | | + | | - **INDEX**: The DOP is categorized based on table indexes. | | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | max-splits | Maximum degree of parallelism. | 5 | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | use-copymanager-for-insert | Whether to use CopyManager for batch import. | false | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | unsupported-type-handling | If the connector does not support the data of a certain type, convert it to VARCHAR. | CONVERT_TO_VARCHAR | + | | | | + | | - After the **CONVERT_TO_VARCHAR** parameter is configured, the data of BIT VARYING, CIDR, MACADDR, INET, OID, REGTYPE, REGCONFIG and POINT types are converted to the varchar type during query and data of these types can only be read. | | + | | - The default value is IGNORE, indicating that unsupported types will be not displayed in the result. | | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | max-bytes-in-a-batch-for-copymanager-in-mb | Maximum volume of data imported by CopyManager in a batch, in MB. | 10 | + +--------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + + - You can click **Delete** to delete custom configuration parameters. + + f. Click **OK**. + + .. important:: + + - The UPDATE and DELETE syntaxes do not support filtering clauses containing cross-catalog conditions, for example, **UPDATE mppdb.table SET column1=value WHERE column2 IN (SELECT column2 from hive.table)**. + - To use the DELETE syntax, set **jdbc.pushdown-enabled** to **true** and **unsupported-type-handling** to **CONVERT_TO_VARCHAR**. + - The DELETE syntax does not support filtering clauses containing subqueries, for example, **DELETE FROM mppdb.table WHERE column IN (SELECT column FROM mppdb.table1)**. + - HetuEngine supports a maximum precision of 38 digits for GaussDB data sources of the NUMBER data type diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hetuengine_data_source.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hetuengine_data_source.rst new file mode 100644 index 0000000..b3788d9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hetuengine_data_source.rst @@ -0,0 +1,122 @@ +:original_name: mrs_01_1719.html + +.. _mrs_01_1719: + +Configuring a HetuEngine Data Source +==================================== + +Scenario +-------- + +This section describes how to add another HetuEngine data source on the HSConsole page for a cluster in security mode. + +Currently, the following data sources are supported: boolean, tinyint, smallint, int, bigint, real, double, decimal, varchar, char, date, timestamp, array, map, time with timezone, timestamp with time zone, and Time. + +Procedure +--------- + +#. .. _mrs_01_1719__en-us_topic_0000001173631388_en-us_topic_0260680339_li3754145741715: + + Obtain the **user.keytab** file of the proxy user of the HetuEngine cluster in a remote domain. + + a. Log in to FusionInsight Manager of the HetuEngine cluster in the remote domain. + b. Choose **System** > **Permission** > **User**. + c. Locate the row that contains the target data source user, click **More** in the **Operation** column, and select **Download Authentication Credential**. + d. The **user.keytab** file extracted from the downloaded file is the user credential file. + +#. Log in to FusionInsight Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. + +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. + +#. Choose **Data Source** and click **Add Data Source**. Configure parameters on the **Add Data Source** page. + + a. Configure **Basic Information**. For details, see :ref:`Table 1 `. + + .. _mrs_01_1719__en-us_topic_0000001173631388_en-us_topic_0260680339_en-us_topic_0260386270_table14393741174611: + + .. table:: **Table 1** Basic Information + + +-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=======================+====================================================================================================================================================================================================================================================+=======================+ + | Name | Name of the data source to be connected. | hetu_1 | + | | | | + | | The value can contain only letters, digits, and underscores (_) and must start with a letter. If the data source name contains uppercase letters, the background automatically converts the uppercase letters to lowercase letters during storage. | | + +-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Data Source Type | Type of the data source to be connected. Select **HetuEngine**. | HetuEngine | + +-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Mode | Mode of the current cluster. The security mode is used by default. | Security Mode | + +-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Description | Description of the data source. | ``-`` | + | | | | + | | The value can contain only letters, digits, commas (,), periods (.), underscores (_), spaces, and line breaks. | | + +-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + + b. Configure parameters in the **HetuEngine Configuration** area. For details, see :ref:`Table 2 `. + + .. _mrs_01_1719__en-us_topic_0000001173631388_en-us_topic_0260680339_en-us_topic_0260386270_table0303122318010: + + .. table:: **Table 2** HetuEngine Configuration + + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +============================+===========================================================================================================================================================================================================================================+=======================+ + | Driver | The default value is **hsfabric-initial**. | hsfabric-initial | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Username | Configure this parameter when the security mode is enabled. | hetu_test | + | | | | + | | It specifies the user who accesses the remote HetuEngine. Set the parameter to the user to whom the **user.keytab** file obtained in :ref:`1 ` belongs. | | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Keytab File | Configure this parameter when the security mode is enabled. | user.keytab | + | | | | + | | This is the Keytab file of the user who accesses the remote DataCenter. Select the **user.keytab** file obtained in :ref:`1 `. | | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Two Way Transmission | This parameter indicates whether to enable bidirectional transmission for cross-domain data transmission. The default value is **Yes**. | Yes | + | | | | + | | - **Yes**: Two-way transmission: Requests are forwarded to the remote HSFabric through the local HSFabric. If two-way transmission is enabled, the local HSFabric address must be configured. | | + | | - No: Unidirectional transmission: Requests are directly sent to the remote HSFabric. | | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Local Configuration | Host IP address and port number of the HSFabric instance that is responsible for external communication of the HetuEngine service in the local MRS cluster. | 192.162.157.32:29900 | + | | | | + | | #. Log in to FusionInsight Manager of the local cluster, choose **Cluster** > **Services** > **HetuEngine** > **Instance**, and check the service IP address of the HSFabric. | | + | | #. Click **HSFabric**, choose **Instance Configuration**, and check the value of **server.port**. The default value is **29900**. | | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Remote Address | Host IP address and port number of the HSFabric instance that is responsible for external communication of the HetuEngine service in the remote MRS cluster. | 192.168.1.1:29900 | + | | | | + | | #. Log in to FusionInsight Manager of the remote cluster, choose **Cluster** > **Services** > **HetuEngine** > **Instance**, and check the service IP address of the HSFabric. | | + | | #. Click **HSFabric**, choose **Instance Configuration**, and check the value of **server.port**. The default value is **29900**. | | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Region | Region to which the current request initiator belongs. The value can contain only digits and underscores (_). | 0755_01 | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Receiving Data Timeout (s) | Timeout interval for receiving data, in seconds. | 60 | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Total Task Timeout (s) | Total timeout duration for executing a cross-domain task, in seconds. | 300 | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Tasks Used by Worker Nodes | Number of tasks used by each worker node to receive data. | 5 | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Data Compression | - **Yes**: Data compression is enabled. | Yes | + | | - **No**: Data compression is disabled. | | + +----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + + c. Modify custom configurations. + + - You can click **Add** to add custom configuration parameters. Configure custom parameters of the HetuEngine data source. For details, see :ref:`Table 3 `. + + .. _mrs_01_1719__en-us_topic_0000001173631388_table107221036132419: + + .. table:: **Table 3** Custom parameters of the HetuEngine data source + + +----------------------------+------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +============================+====================================================================================+=======================+ + | hsfabric.health.check.time | Interval for checking the HSFabric instance status, in seconds. | 60 | + +----------------------------+------------------------------------------------------------------------------------+-----------------------+ + | hsfabric.subquery.pushdown | Whether to enable cross-domain query pushdown. The function is enabled by default. | true | + | | | | + | | - **true**: enables cross-domain query pushdown. | | + | | - **false**: disables cross-domain query pushdown. | | + +----------------------------+------------------------------------------------------------------------------------+-----------------------+ + + - You can click **Delete** to delete custom configuration parameters. + + d. Click **OK**. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/configuring_a_co-deployed_hive_data_source.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/configuring_a_co-deployed_hive_data_source.rst new file mode 100644 index 0000000..7bf50e1 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/configuring_a_co-deployed_hive_data_source.rst @@ -0,0 +1,49 @@ +:original_name: mrs_01_24253.html + +.. _mrs_01_24253: + +Configuring a Co-deployed Hive Data Source +========================================== + +Scenario +-------- + +This section describes how to add a Hive data source of the same Hadoop cluster as HetuEngine on HSConsole. + +Currently, HetuEngine supports data sources of the following traditional data formats: AVRO, TEXT, RCTEXT, Binary, ORC, Parquet, and SequenceFile. + +Prerequisites +------------- + +A HetuEngine compute instance has been created. + +.. note:: + + The HetuEngine service is interconnected to its co-deployed Hive data source by default during its installation. The data source name is **hive** and cannot be deleted. Some default configurations cannot be modified. You need to restart the HetuEngine service to automatically synchronize these unmodifiable configurations once they are updated. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. +#. On HSConsole, choose **Data Source**. Locate the row that contains the target Hive data source, click **Edit** in the **Operation** column, and modify the configurations. The following table describes data source configurations that can be modified. + + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+ + | Parameter | Description | Example Value | + +===================================+====================================================================================================================================================================+=======================================================+ + | Description | Description of the data source. | ``-`` | + | | | | + | | The value can contain only letters, digits, commas (,), periods (.), underscores (_), spaces, and line breaks. | | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+ + | Enable Data Source Authentication | Whether to use the permission policy of the Hive data source for authentication. | No | + | | | | + | | If Ranger is disabled for the HetuEngine service, select **Yes**. If Ranger is enabled, select **No**. | | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+ + | Metastore URL | Value of **hive.metastore.uris** in **hive-site.xml** on the data source client. When the Hive MetaStore instance changes, you need to manually update this value. | thrift://192.168.1.1:21088,thrift://192.168.1.2:21088 | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+ + | Enable Connection Pool | Whether to enable the connection pool when accessing Hive MetaStore. The default value is **Yes** | Yes | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+ + | Maximum Connections | Maximum number of connections in the connection pool when accessing Hive MetaStore. | 50 (Value range: 0-200) | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------+ + +#. (Optional) If you need to add **Custom Configuration**, complete the configurations by referring to :ref:`6.g ` and click **OK** to save the configurations. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/configuring_a_hudi_data_source.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/configuring_a_hudi_data_source.rst new file mode 100644 index 0000000..c88031b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/configuring_a_hudi_data_source.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_2363.html + +.. _mrs_01_2363: + +Configuring a Hudi Data Source +============================== + +Scenario +-------- + +HetuEngine can be connected to the Hudi data source of the cluster of MRS 3.1.1 or later. + +.. note:: + + HetuEngine does not support the reading of Hudi bootstrap tables. + +Prerequisites +------------- + +- You have created the proxy user of the Hudi data source. The proxy user is a human-machine user and must belong to the **hive** group. +- You have created a HetuEngine administrator by referring to :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Perform :ref:`1 ` to :ref:`6.g ` to configure a traditional data source by referring to :ref:`Configuring a Traditional Data Source `. + +#. In the **Custom Configuration** area, add a custom parameter, as listed in :ref:`Table 1 `. + + .. _mrs_01_2363__en-us_topic_0000001219149163_table1687934216313: + + .. table:: **Table 1** Custom configuration parameter for Hudi data + + ============================= ===== + Custom Parameter Name Value + ============================= ===== + hive.parquet.use-column-names true + ============================= ===== + +#. Click **OK**. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/configuring_a_traditional_data_source.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/configuring_a_traditional_data_source.rst new file mode 100644 index 0000000..89e78b9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/configuring_a_traditional_data_source.rst @@ -0,0 +1,278 @@ +:original_name: mrs_01_2348.html + +.. _mrs_01_2348: + +Configuring a Traditional Data Source +===================================== + +Scenario +-------- + +This section describes how to add a Hive data source on HSConsole. + +Currently, HetuEngine supports data sources of the following traditional data formats: AVRO, TEXT, RCTEXT, Binary, ORC, Parquet, and SequenceFile. + +Prerequisites +------------- + +- The domain name of the cluster where the data source is located must be different from the HetuEngine cluster domain name. +- The cluster where the data source is located and the HetuEngine cluster nodes can communicate with each other. +- A HetuEngine compute instance has been created. + +Procedure +--------- + +#. .. _mrs_01_2348__en-us_topic_0000001219351171_li4515762384: + + Obtain the **hdfs-site.xml** and **core-site.xml** configuration files of the Hive data source cluster. + + a. Log in to FusionInsight Manager of the cluster where the Hive data source is located. + + b. Choose **Cluster** > **Dashboard**. + + c. Choose **More** > **Download Client** and download the client file to the local computer. + + d. Decompress the downloaded client file package and obtain the **core-site.xml** and **hdfs-site.xml** files in the **FusionInsight_Cluster_1_Services_ClientConfig/HDFS/config** directory. + + e. Check whether the **core-site.xml** file contains the **fs.trash.interval** configuration item. If not, add the following configuration items: + + .. code-block:: + + + fs.trash.interval + 2880 + + + f. Change the value of **dfs.client.failover.proxy.provider.hacluster** in the **hdfs-site.xml** file to **org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider**. + + .. code-block:: + + + dfs.client.failover.proxy.provider.hacluster + org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider + + + - If HDFS has multiple NameServices, change the values of **dfs.client.failover.proxy.provider.**\ *NameService name* for multiple NameServices in the **hdfs-site.xml** file to **org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider**. + - In addition, if the **hdfs-site.xml** file references the host name of a non-HetuEngine cluster node, you need to add the mapping between the referenced host name and the corresponding IP address to the **/etc/hosts** file of each HetuEngine cluster node. Otherwise, HetuEngine cannot connect to the node that is not in this cluster based on the host name. + + .. important:: + + If the Hive data source to be interconnected is in the same Hadoop cluster with HetuEngine, you can log in to the HDFS client and run the following commands to obtain the **hdfs-site.xml** and **core-site.xml** configuration files. For details, see :ref:`Using the HDFS Client `. + + **hdfs dfs -get /user/hetuserver/fiber/restcatalog/hive/core-site.xml** + + **hdfs dfs -get /user/hetuserver/fiber/restcatalog/hive/hdfs-site.xml** + +#. .. _mrs_01_2348__en-us_topic_0000001219351171_li051517619384: + + Obtain the **user.keytab** and **krb5.conf** files of the proxy user of the Hive data source. + + a. Log in to FusionInsight Manager of the cluster where the Hive data source is located. + b. Choose **System** > **Permission** > **User**. + c. Locate the row that contains the target data source user, click **More** in the **Operation** column, and select **Download Authentication Credential**. + d. Decompress the downloaded package to obtain the **user.keytab** and **krb5.conf** files. + + .. note:: + + The proxy user of the Hive data source must be associated with at least the **hive** user group. + +#. .. _mrs_01_2348__en-us_topic_0000001219351171_li05151668382: + + Obtain the MetaStore URL and the Principal of the server. + + a. Decompress the client package of the cluster where the Hive data source is located and obtain the **hive-site.xml** file from the **FusionInsight_Cluster_1_Services_ClientConfig/Hive/config** directory. + b. Open the **hive-site.xml** file and search for **hive.metastore.uris**. The value of **hive.metastore.uris** is the value of MetaStore URL. Search for **hive.server2.authentication.kerberos.principal**. The value of **hive.server2.authentication.kerberos.principal** is the value of Principal on the server. + +#. Log in to FusionInsight Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. + +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. + +#. Choose **Data Source** and click **Add Data Source**. Configure parameters on the **Add Data Source** page. + + a. Configure parameters in the **Basic Information** area. For details, see :ref:`Table 1 `. + + .. _mrs_01_2348__en-us_topic_0000001219351171_table14381114211546: + + .. table:: **Table 1** Basic Information + + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=======================+================================================================================================================+=======================+ + | Name | Name of the data source to be connected. | hive_1 | + | | | | + | | The value can contain only letters, digits, and underscores (_) and must start with a letter. | | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Data Source Type | Type of the data source to be connected. Select **Hive**. | Hive | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Mode | Mode of the current cluster. The default value is **Security Mode**. | ``-`` | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Description | Description of the data source. | ``-`` | + | | | | + | | The value can contain only letters, digits, commas (,), periods (.), underscores (_), spaces, and line breaks. | | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + + b. Configure parameters in the **Hive Configuration** area. For details, see :ref:`Table 2 `. + + .. _mrs_01_2348__en-us_topic_0000001219351171_table738124295411: + + .. table:: **Table 2** Hive Configuration + + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +===================================+===============================================================================================================================================================================+=======================+ + | Driver | The default value is **fi-hive-hadoop**. | fi-hive-hadoop | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | hdfs-site File | Select the **hdfs-site.xml** configuration file obtained in :ref:`1 `. The file name is fixed. | ``-`` | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | core-site File | Select the **core-site.xml** configuration file obtained in :ref:`1 `. The file name is fixed. | ``-`` | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | krb5 File | Configure this parameter when the security mode is enabled. | krb5.conf | + | | | | + | | It is the configuration file used for Kerberos authentication. Select the **krb5.conf** file obtained in :ref:`2 `. | | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Enable Data Source Authentication | Whether to use the permission policy of the Hive data source for authentication. | No | + | | | | + | | If Ranger is disabled for the HetuEngine service, select **Yes**. If Ranger is enabled, select **No**. | | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + + c. Configure parameters in the **MetaStore Configuration** area. For details, see :ref:`Table 3 `. + + .. _mrs_01_2348__en-us_topic_0000001219351171_table18381204214544: + + .. table:: **Table 3** MetaStore Configuration + + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +===================================+==============================================================================================================================================================================================================+===============================================================================+ + | Metastore URL | URL of the MetaStore of the data source. For details, see :ref:`3 `. | thrift://10.92.8.42:21088,thrift://10.92.8.43:21088,thrift://10.92.8.44:21088 | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+ + | Security Authentication Mechanism | After the security mode is enabled, the default value is **KERBEROS**. | KERBEROS | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+ + | Server Principal | Configure this parameter when the security mode is enabled. | hive/hadoop.hadoop.com@HADOOP.COM | + | | | | + | | It specifies the username with domain name used by meta to access MetaStore. For details, see :ref:`3 `. | | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+ + | Client Principal | Configure this parameter when the security mode is enabled. | admintest@HADOOP.COM | + | | | | + | | The parameter format is as follows: *Username for accessing MetaStore*\ **@**\ *domain name (uppercase)*\ **.COM**. | | + | | | | + | | *Username for accessing MetaStore* is the user to which the **user.keytab** file obtained in :ref:`2 ` belongs. | | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+ + | Keytab File | Configure this parameter when the security mode is enabled. | user.keytab | + | | | | + | | It specifies the keytab credential file of the MetaStore user name. The file name is fixed. Select the **user.keytab** file obtained in :ref:`2 `. | | + +-----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+ + + d. Configure parameters in the **Connection Pool Configuration** area. For details, see :ref:`Table 4 `. + + .. _mrs_01_2348__en-us_topic_0000001219351171_table3527185012913: + + .. table:: **Table 4** Connection Pool Configuration + + +------------------------+-------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Example Value | + +========================+=====================================================================================+===============+ + | Enable Connection Pool | Whether to enable the connection pool when accessing Hive MetaStore. | Yes/No | + +------------------------+-------------------------------------------------------------------------------------+---------------+ + | Maximum Connections | Maximum number of connections in the connection pool when accessing Hive MetaStore. | 50 | + +------------------------+-------------------------------------------------------------------------------------+---------------+ + + e. Configure parameters in **Hive User Information Configuration**. For details, see :ref:`Table 5 `. + + **Hive User Information Configuration** and **HetuEngine-Hive User Mapping Configuration** must be used together. When HetuEngine is connected to the Hive data source, user mapping enables HetuEngine users to have the same permissions of the mapped Hive data source user. Multiple HetuEngine users can correspond to one Hive user. + + .. _mrs_01_2348__en-us_topic_0000001219351171_table17970173051814: + + .. table:: **Table 5** Hive User Information Configuration + + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+=========================================================================================================================================================================================================================+=======================================================================================================================================================================+ + | Data Source User | Data source user information. | If the data source user is set to **hiveuser1**, a HetuEngine user mapped to **hiveuser1** must exist. For example, create **hetuuser1** and map it to **hiveuser1**. | + | | | | + | | The value can contain only letters, digits, underscores (_), hyphens (-), and periods (.), and must start with a letter or underscore (_). The minimum length is 2 characters and the maximum length is 100 characters. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Keytab File | Obtain the authentication credential of the user corresponding to the data source. | hiveuser1.keytab | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + f. Configure parameters in the **HetuEngine-Hive User Mapping Configuration** area. For details, see :ref:`Table 6 `. + + .. _mrs_01_2348__en-us_topic_0000001219351171_table363354362015: + + .. table:: **Table 6** HetuEngine-Hive User Mapping Configuration + + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+=========================================================================================================================================================================================================================+===============================================================================================================================+ + | HetuEngine User | HetuEngine user information. | hetuuser1 | + | | | | + | | The value can contain only letters, digits, underscores (_), hyphens (-), and periods (.), and must start with a letter or underscore (_). The minimum length is 2 characters and the maximum length is 100 characters. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | Data Source User | Data source user information. | **hiveuser1** (data source user configured in :ref:`Table 5 `) | + | | | | + | | The value can contain only letters, digits, underscores (_), hyphens (-), and periods (.), and must start with a letter or underscore (_). The minimum length is 2 characters and the maximum length is 100 characters. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + + g. .. _mrs_01_2348__en-us_topic_0000001219351171_li1438274211549: + + Modify custom configurations. + + - You can click **Add** to add custom configuration parameters by referring to :ref:`Table 7 `. + + .. _mrs_01_2348__en-us_topic_0000001219351171_table16627151232810: + + .. table:: **Table 7** Custom parameters + + +-----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=========================================+=================================================================================================================================================================+================================================================================================================+ + | hive.metastore.connection.pool.maxTotal | Maximum number of connections in the connection pool. | 50 (Value range: 0-200) | + +-----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ + | hive.metastore.connection.pool.maxIdle | Maximum number of idle threads in the connection pool. When the number of idle threads reaches the maximum number, new threads are not released. | 10 (The value ranges from 0 to 200 and cannot exceed the maximum number of connections.) | + | | | | + | | Default value: **10**. | | + +-----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ + | hive.metastore.connection.pool.minIdle | Minimum number of idle threads in the connection pool. When the number of idle threads reaches the minimum number, the thread pool does not create new threads. | 10 (The value ranges from 0 to 200 and cannot exceed the value of **hive.metastore.connection.pool.maxIdle**.) | + | | | | + | | Default value: **10**. | | + +-----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+ + + - You can click **Delete** to delete custom configuration parameters. + + .. note:: + + - You can add prefixes **coordinator.** and **worker.** to the preceding custom configuration items to configure coordinators and workers, respectively. For example, if **worker.hive.metastore.connection.pool.maxTotal** is set to **50**, a maximum number of 50 connections are allowed for workers to access Hive MetaStore. If no prefix is added, the configuration item is valid for both coordinators and workers. + - By default, the maximum number of connections for coordinators to access Hive MetaStore is 5 and the maximum and minimum numbers of idle data source connections are both 10. The maximum number of connections for workers to access Hive MetaStore is 20, the maximum and minimum numbers of idle data source connections are both 0. + + h. Click **OK**. + +#. Log in to the node where the cluster client is located and run the following commands to switch to the client installation directory and authenticate the user: + + **cd /opt/client** + + **source bigdata_env** + + **kinit** *User performing HetuEngine operations* (If the cluster is in normal mode, skip this step.) + +#. Run the following command to log in to the catalog of the data source: + + **hetu-cli --catalog** *Data source name* **--schema default** + + For example, run the following command: + + **hetu-cli --catalog** **hive\_1** **--schema default** + +#. Run the following command to view the database table: + + **show tables;** + + .. code-block:: + + Table + --------- + hivetb + (1 rows) + + Query 20210730_084524_00023_u3sri@default@HetuEngine, FINISHED, 3 nodes + Splits: 36 total, 36 done (100.00%) + 0:00 [2 rows, 47B] [7 rows/s, 167B/s] diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/index.rst new file mode 100644 index 0000000..1b71c94 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_a_hive_data_source/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_24174.html + +.. _mrs_01_24174: + +Configuring a Hive Data Source +============================== + +- :ref:`Configuring a Co-deployed Hive Data Source ` +- :ref:`Configuring a Traditional Data Source ` +- :ref:`Configuring a Hudi Data Source ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + configuring_a_co-deployed_hive_data_source + configuring_a_traditional_data_source + configuring_a_hudi_data_source diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_an_hbase_data_source.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_an_hbase_data_source.rst new file mode 100644 index 0000000..0e1c1ca --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_an_hbase_data_source.rst @@ -0,0 +1,155 @@ +:original_name: mrs_01_2349.html + +.. _mrs_01_2349: + +Configuring an HBase Data Source +================================ + +Scenario +-------- + +This section describes how to add an HBase data source on HSConsole. + +Prerequisites +------------- + +- The domain name of the cluster where the data source is located must be different from the HetuEngine cluster domain name. +- The cluster where the data source is located and the HetuEngine cluster nodes can communicate with each other. +- A HetuEngine compute instance has been created. + +- The SSL communication encryption configuration of ZooKeeper in the cluster where the data source is located must be the same as that of ZooKeeper in the cluster where HetuEngine is located. + + .. note:: + + To check whether SSL communication encryption is enabled, log in to FusionInsight Manager, choose **Cluster** > **Services** > **ZooKeeper** > **Configurations** > **All Configurations**, and enter **ssl.enabled** in the search box. If the value of **ssl.enabled** is **true**, SSL communication encryption is enabled. If the value is **false**, SSL communication encryption is disabled. + +Procedure +--------- + +#. .. _mrs_01_2349__en-us_topic_0000001173949888_li1823132412324: + + Obtain the **hbase-site.xml**, **hdfs-site.xml**, and **core-site.xml** configuration files of the HBase data source. + + a. Log in to FusionInsight Manager of the cluster where the HBase data source is located. + + b. Choose **Cluster** > **Dashboard**. + + c. Choose **More** > **Download Client** and download the client file as prompted. + + d. Decompress the downloaded client file package and obtain the **hbase-site.xml**, **core-site.xml**, and **hdfs-site.xml** files in the **FusionInsight_Cluster_1_Services_ClientConfig/HBase/config** directory. + + e. If **hbase.rpc.client.impl** exists in the **hbase-site.xml** file, change the value of **hbase.rpc.client.impl** to **org.apache.hadoop.hbase.ipc.RpcClientImpl**. + + .. code-block:: + + + hbase.rpc.client.impl + org.apache.hadoop.hbase.ipc.RpcClientImpl + + + .. note:: + + In addition, if the **hdfs-site.xml** and **hbase-site.xml** files reference the host name of a non-HetuEngine cluster node, you need to add the mapping between the referenced host name and the corresponding IP address to the **/etc/hosts** file of each node in the HetuEngine cluster. Otherwise, HetuEngine cannot connect to the node that is not in this cluster based on the host name. + +#. .. _mrs_01_2349__en-us_topic_0000001173949888_li1823134583712: + + Obtain the **user.keytab** and **krb5.conf** files of the proxy user of the HBase data source. + + a. Log in to FusionInsight Manager of the cluster where the HBase data source is located. + b. Choose **System** > **Permission** > **User**. + c. Locate the row that contains the target data source user, click **More** in the **Operation** column, and select **Download Authentication Credential**. + d. Decompress the downloaded package to obtain the **user.keytab** and **krb5.conf** files. + + .. note:: + + The proxy user of the data source must have the permission to perform HBase operations. + +#. Log in to FusionInsight Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. + +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. + +#. Choose **Data Source**. + +6. Click **Add Data Source**. Configure parameters on the **Add Data Source** page. + + a. Configure **Basic Information**. For details, see :ref:`Table 1 `. + + .. _mrs_01_2349__en-us_topic_0000001173949888_table1755613194216: + + .. table:: **Table 1** Basic Information + + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=======================+================================================================================================================+=======================+ + | Name | Name of the data source to be connected. | hbase_1 | + | | | | + | | The value can contain only letters, digits, and underscores (_) and must start with a letter. | | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Data Source Type | Type of the data source to be connected. Select **HBase**. | HBase | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + | Description | Description of the data source. | ``-`` | + | | | | + | | The value can contain only letters, digits, commas (,), periods (.), underscores (_), spaces, and line breaks. | | + +-----------------------+----------------------------------------------------------------------------------------------------------------+-----------------------+ + + b. Configure parameters in the **HBase Configuration** area. For details, see :ref:`Table 2 `. + + .. _mrs_01_2349__en-us_topic_0000001173949888_table1075911311429: + + .. table:: **Table 2** HBase Configuration + + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | Parameter | Description | Example Value | + +====================================+================================================================================================================================================================================================================================================+=================================================+ + | Driver | The default value is **hbase-connector**. | hbase-connector | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | ZooKeeper Quorum Address | Service IP addresses of all quorumpeer instances of the ZooKeeper service for the data source. If the ZooKeeper service of the data source uses IPv6, you need to specify the client port number in the ZooKeeper Quorum address. | - IPv4: 10.0.136.132,10.0.136.133,10.0.136.134 | + | | | - IPv6: [0.0.0.0.0.0.0.0]:24002 | + | | Log in to FusionInsight Manager, choose **Cluster** > **Services** > **ZooKeeper** > **Instance**, and view the IP addresses of all the hosts housing the quorumpeer instances. | | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | ZooKeeper Client Port Number | Port number of the ZooKeeper client. | 2181 | + | | | | + | | Log in to FusionInsight Manager and choose **Cluster** > **Service** > **ZooKeeper**. On the **Configurations** tab page, check the value of **clientPort**. | | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | HBase RPC Communication Protection | Set this parameter based on the value of **hbase.rpc.protection** in the **hbase-site.xml** file obtained in :ref:`1 `. | No | + | | | | + | | - If the value is **authentication**, set this parameter to **No**. | | + | | - If the value is **privacy**, set this parameter to **Yes**. | | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | Security Authentication Mechanism | After the security mode is enabled, the default value is **KERBEROS**. | KERBEROS | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | Principal | Configure this parameter when the security authentication mechanism is enabled. Set the parameter to the user to whom the **user.keytab** file obtained in :ref:`2 ` belongs. | user_hbase@HADOOP2.COM | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | Keytab File | Configure this parameter when the security mode is enabled. It specifies the security authentication key. Select the **user.keytab** file obtained in :ref:`2 `. | user.keytab | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | krb5 File | Configure this parameter when the security mode is enabled. It is the configuration file used for Kerberos authentication. Select the **krb5.conf** file obtained in :ref:`2 `. | krb5.conf | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | hbase-site File | Configure this parameter when the security mode is enabled. It is the configuration file required for connecting to HDFS. Select the **hbase-site.xml** file obtained in :ref:`1 `. | hbase-site.xml | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | core-site File | Configure this parameter when the security mode is enabled. This file is required for connecting to HDFS. Select the **core-site.xml** file obtained in :ref:`1 `. | core-site.xml | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + | hdfs-site File | Configure this parameter when the security mode is enabled. This file is required for connecting to HDFS. Select the **hdfs-site.xml** file obtained in :ref:`1 `. | hdfs-site.xml | + +------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+ + + c. Modify custom configurations. + + - You can click **Add** to add custom configuration parameters. + - You can click **Delete** to delete custom configuration parameters. + + d. Click **OK**. + +7. Log in to the node where the cluster client is located and run the following commands to switch to the client installation directory and authenticate the user: + + **cd /opt/client** + + **source bigdata_env** + + **kinit** *User performing HetuEngine operations* (If the cluster is in normal mode, skip this step.) + +8. Run the following command to log in to the catalog of the data source: + + **hetu-cli --catalog** *Data source name* **--schema default** + + For example, run the following command: + + **hetu-cli --catalog** **hbase_1** **--schema default** diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/index.rst new file mode 100644 index 0000000..eff1440 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/index.rst @@ -0,0 +1,24 @@ +:original_name: mrs_01_2314.html + +.. _mrs_01_2314: + +Configuring Data Sources +======================== + +- :ref:`Before You Start ` +- :ref:`Configuring a Hive Data Source ` +- :ref:`Configuring an HBase Data Source ` +- :ref:`Configuring a GaussDB Data Source ` +- :ref:`Configuring a HetuEngine Data Source ` +- :ref:`Configuring a ClickHouse Data Source ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + before_you_start + configuring_a_hive_data_source/index + configuring_an_hbase_data_source + configuring_a_gaussdb_data_source + configuring_a_hetuengine_data_source + configuring_a_clickhouse_data_source diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/creating_hetuengine_compute_instances.rst b/doc/component-operation-guide-lts/source/using_hetuengine/creating_hetuengine_compute_instances.rst new file mode 100644 index 0000000..bf35400 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/creating_hetuengine_compute_instances.rst @@ -0,0 +1,149 @@ +:original_name: mrs_01_1731.html + +.. _mrs_01_1731: + +Creating HetuEngine Compute Instances +===================================== + +Scenario +-------- + +This section describes how to create a HetuEngine compute instance. If you want to stop the cluster where compute instances are successfully created, you need to manually stop the compute instances first. If you want to use the compute instances after the cluster is restarted, you need to manually start them. + +Prerequisites +------------- + +- You have created a user for accessing the HetuEngine web UI. For details, see :ref:`Creating a HetuEngine User `. +- You have created a tenant in the cluster to be operated. For details about how to create a tenant. Ensure that the tenant has sufficient memory and CPUs when modifying the HetuEngine compute instance configuration. + + .. note:: + + You must use a leaf tenant when creating a HetuEngine compute instance. Yarn tasks can be submitted only to the queues of the leaf tenant. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a user who can access the HetuEngine web UI and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Click **Create Configuration** above the instance list. In the **Configure Instance** dialog box, set parameters. + + a. Set parameters in the **Basic Configuration** area. For details about the parameters, see :ref:`Table 1 `. + + .. _mrs_01_1731__en-us_topic_0000001173631202_table18558203113529: + + .. table:: **Table 1** Basic configuration + + +----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+ + | Parameter | Description | Example Value | + +========================================+================================================================================================================================================================================================================================================================================================================================================================================================================+============================================================+ + | Resource Queue | Resource queue of the instance. Only one compute instance can be created in a resource queue. | Select a queue from the **Resource Queue** drop-down list. | + +----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+ + | Instance Deployment Timeout Period (s) | Timeout interval for starting a compute instance by Yarn service deployment. The system starts timing when the compute instance is started. If the compute instance is still in the **Creating** or **Starting** state after the time specified by this parameter expires, the compute instance status is displayed as **Error** and the compute instance that is being created or started on Yarn is stopped. | 300 | + | | | | + | | | The value ranges from 1 to 2147483647. | + +----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+ + + b. Set parameters in the **Coordinator Container Resource Configuration** area. For details about the parameters, see :ref:`Table 2 `. + + .. _mrs_01_1731__en-us_topic_0000001173631202_table6559143115525: + + .. table:: **Table 2** Parameters for configuring Coordinator container resources + + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+ + | Parameter | Description | Example Value | + +=======================+=======================================================================================================================================================================================================================================================================================================+========================================+ + | Container Memory (MB) | Memory size (MB) allocated by Yarn to a single container of the compute instance Coordinator | Default value: 5120 | + | | | | + | | | The value ranges from 1 to 2147483647. | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+ + | vcore | Number of vCPUs (vCores) allocated by Yarn to a single container of the compute instance Coordinator | Default value: 1 | + | | | | + | | | The value ranges from 1 to 2147483647. | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+ + | Quantity | Number of containers allocated by Yarn to the compute instance Coordinator | Default value: 2 | + | | | | + | | | The value ranges from 1 to 3. | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+ + | JVM | Log in to FusionInsight Manager and choose **Cluster** > **Services** > **HetuEngine** > **Configurations**. On the **All Configurations** tab page, search for **extraJavaOptions**. The value of this parameter in the **coordinator.jvm.config** parameter file is the value of the JVM parameter. | ``-`` | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+ + + c. Set parameters in the **Worker Container Resource Configuration** area. For details about the parameters, see :ref:`Table 3 `. + + .. _mrs_01_1731__en-us_topic_0000001173631202_table25611531175211: + + .. table:: **Table 3** Parameters for configuring Worker container resources + + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+ + | Parameter | Description | Example Value | + +=======================+==================================================================================================================================================================================================================================================================================================+========================================+ + | Container Memory (MB) | Memory size (MB) allocated by Yarn to a single container of the compute instance Worker | Default value: 10240 | + | | | | + | | | The value ranges from 1 to 2147483647. | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+ + | vcore | Number of vCPUs (vCores) allocated by Yarn to a single container of the compute instance Worker | Default value: 1 | + | | | | + | | | The value ranges from 1 to 2147483647. | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+ + | Quantity | Number of containers allocated by Yarn to the compute instance Worker | Default value: 2 | + | | | | + | | | The value ranges from 1 to 256. | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+ + | JVM | Log in to FusionInsight Manager and choose **Cluster** > **Services** > **HetuEngine** > **Configurations**. On the **All Configurations** tab page, search for **extraJavaOptions**. The value of this parameter in the **worker.jvm.config** parameter file is the value of the JVM parameter. | ``-`` | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+ + + d. Set parameters in the **Advanced Configuration** area. For details about the parameters, see :ref:`Table 4 `. + + .. _mrs_01_1731__en-us_topic_0000001173631202_table15562731135211: + + .. table:: **Table 4** Advanced configuration parameters + + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Example Value | + +=======================+==================================================================================================================================================================================================================================================================================================+===============+ + | Ratio of Query Memory | Ratio of the node query memory to the JVM memory. The default value is **0**. When this parameter is set to **0**, the calculation function is disabled. | 0 | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Scaling | If auto scaling is enabled, you can increase or decrease the number of Workers without restarting the instance. However, the instance performance may deteriorate. For details about the parameters for enabling dynamic scaling, see :ref:`Adjusting the Number of Worker Nodes `. | OFF | + +-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + + e. .. _mrs_01_1731__en-us_topic_0000001173631202_li135621231135216: + + Configure **Custom Configuration** parameters. Choose **Advanced Configuration** > **Custom Configuration** and add custom parameters to the specified parameter file. Select the specified parameter file from the **Parameter File** drop-down list. + + - You can click **Add** to add custom configuration parameters. + + - You can click **Delete** to delete custom configuration parameters. + + - **resource-groups.json** takes effect only in the customized configuration of the Coordinator node. For details about the parameters for configuring resource groups, see :ref:`Table 5 `. + + .. _mrs_01_1731__en-us_topic_0000001173631202_table439014781612: + + .. table:: **Table 5** Resource pool group parameters + + +-----------------------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+ + | Parameter | Description | Example Value | + +=======================+========================================================================================================================================+==================================+ + | resourcegroups | Resource management group configuration of the cluster. Select **resource-groups.json** from the drop-down list of the parameter file. | .. code-block:: | + | | | | + | | | { | + | | | "rootGroups": [{ | + | | | "name": "global", | + | | | "softMemoryLimit": "100%", | + | | | "hardConcurrencyLimit": 1000, | + | | | "maxQueued": 10000 | + | | | }], | + | | | "selectors": [{ | + | | | "group": "global" | + | | | }] | + | | | } | + +-----------------------+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+ + + .. note:: + + For the **coordinator.config.properties**, **worker.config.properties**, **log.properties**, and **resource-groups.json** parameter files, if a user-defined parameter name already exists in the specified parameter file, the original parameter values in the parameter file are replaced with the customized parameter value. If the name of the custom parameter does not exist in the specified parameter file, the custom parameter is added to the specified parameter file. + + f. Determine whether to start the instance immediately after the configuration is complete. + + - Select **Start Now** to start the instance immediately after the configuration is complete. + - Deselect **Start Now** and manually start the instance after the configuration is complete. + +#. Click **OK** and wait until the instance configuration is complete. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/function_&_udf_development_and_application/hetuengine_function_plugin_development_and_application.rst b/doc/component-operation-guide-lts/source/using_hetuengine/function_&_udf_development_and_application/hetuengine_function_plugin_development_and_application.rst new file mode 100644 index 0000000..fc9c53b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/function_&_udf_development_and_application/hetuengine_function_plugin_development_and_application.rst @@ -0,0 +1,250 @@ +:original_name: mrs_01_2339.html + +.. _mrs_01_2339: + +HetuEngine Function Plugin Development and Application +====================================================== + +You can customize functions to extend SQL statements to meet personalized requirements. These functions are called UDFs. + +This section describes how to develop and apply HetuEngine function plugins. + +Developing Function Plugins +--------------------------- + +This sample implements two function plugins described in the following table. + +.. table:: **Table 1** HetuEngine function plugins + + +------------+----------------------------------------------------------------------------------------------------------------+---------------------+ + | Parameter | Description | Type | + +============+================================================================================================================+=====================+ + | add_two | Adds **2** to the input integer and returns the result. | ScalarFunction | + +------------+----------------------------------------------------------------------------------------------------------------+---------------------+ + | avg_double | Aggregates and calculates the average value of a specified column. The field type of the column is **double**. | AggregationFunction | + +------------+----------------------------------------------------------------------------------------------------------------+---------------------+ + +#. Create a Maven project. Set **groupId** to **com.test.udf** and **artifactId** to **udf-test**. The two values can be customized based on the site requirements. + +#. Modify the **pom.xml** file as follows: + + .. code-block:: + + + 4.0.0 + com.test.udf + udf-test + 0.0.1-SNAPSHOT + + hetu-plugin + + + + com.google.guava + guava + 26.0-jre + + + + io.hetu.core + presto-spi + 1.2.0 + provided + + + + + + + + org.apache.maven.plugins + maven-assembly-plugin + 2.4.1 + + UTF-8 + + + + io.hetu + presto-maven-plugin + 9 + true + + + + + +#. Create the implementation class of the function plugin. + + 1. Create the function plugin implementation class **com.hadoop.other.TestUDF4**. The code is as follows: + + .. code-block:: + + public class TestUDF4 { + @ScalarFunction("add_two") + @SqlType(StandardTypes.INTEGER) + public static long add2(@SqlNullable @SqlType(StandardTypes.INTEGER) Long i) { + return i+2; + } + + 2. Create the function plugin implementation class **com.hadoop.other.AverageAggregation**. The code is as follows: + + .. code-block:: + + @AggregationFunction("avg_double") + public class AverageAggregation + { + @InputFunction + public static void input( + LongAndDoubleState state, + @SqlType(StandardTypes.DOUBLE) double value) + { + state.setLong(state.getLong() + 1); + state.setDouble(state.getDouble() + value); + } + + @CombineFunction + public static void combine( + LongAndDoubleState state, + LongAndDoubleState otherState) + { + state.setLong(state.getLong() + otherState.getLong()); + state.setDouble(state.getDouble() + otherState.getDouble()); + } + + @OutputFunction(StandardTypes.DOUBLE) + public static void output(LongAndDoubleState state, BlockBuilder out) + { + long count = state.getLong(); + if (count == 0) { + out.appendNull(); + } + else { + double value = state.getDouble(); + DOUBLE.writeDouble(out, value / count); + } + } + } + +#. Create the **com.hadoop.other.LongAndDoubleState** API on which **AverageAggregation** depends. + + .. code-block:: + + public interface LongAndDoubleState extends AccumulatorState { + long getLong(); + + void setLong(long value); + + double getDouble(); + + void setDouble(double value); + } + +#. Create the function plugin registration class **com.hadoop.other.RegisterFunctionTestPlugin**. The code is as follows: + + .. code-block:: + + public class RegisterFunctionTestPlugin implements Plugin { + + @Override + public Set> getFunctions() { + return ImmutableSet.>builder() + .add(TestUDF4.class) + .add(AverageAggregation.class) + .build(); + } + } + +#. Pack the Maven project and obtain the **udf-test-0.0.1-SNAPSHOT** directory in the **target** directory. The following figure shows the overall structure of the project. + + |image1| + +Deploying Function Plugins +-------------------------- + +Before the deployment, ensure that: + +- The HetuEngine service is normal. +- The HDFS and HetuEngine client have been installed on the cluster node, for example, in the **/opt/client** directory. +- A HetuEngine user has been created. For details about how to create a user, see :ref:`Creating a HetuEngine User `. + +#. Upload the **udf-test-0.0.1-SNAPSHOT** directory obtained in packing the Maven project to any directory on the node where the client is installed. +#. Upload the **udf-test-0.0.1-SNAPSHOT** directory to HDFS. + + a. Log in to the node where the client is installed and perform security authentication. + + **cd /opt/client** + + **source bigdata_env** + + **kinit** *HetuEngine user* + + Enter the password as prompted and change the password upon the first authentication. + + b. Create the following paths in HDFS. If the paths already exist, skip this step. + + **hdfs dfs -mkdir -p /user/hetuserver/udf/data/externalFunctionsPlugin** + + c. Upload the **udf-test-0.0.1-SNAPSHOT** directory to HDFS. + + **hdfs dfs -put udf-test-0.0.1-SNAPSHOT /user/hetuserver/udf/data/externalFunctionsPlugin** + + d. Change the directory owner and owner group. + + **hdfs dfs -chown -R hetuserver:hadoop /user/hetuserver/udf/data** + +#. Restart the HetuEngine compute instance. + +Verifying Function Plugins +-------------------------- + +#. Log in to the node where the client is installed and perform security authentication. + + **cd /opt/client** + + **source bigdata_env** + + **kinit** *HetuEngine user* + + **hetu-cli --catalog hive --schema default** + +#. Verify function plugins. + + a. Query a table. + + **select \* from test1;** + + .. code-block:: + + select * from test1; + name | price + --------|------- + apple | 17.8 + orange | 25.0 + (2 rows) + + b. Return the average value. + + **select avg_double(price) from test1;** + + .. code-block:: + + select avg_double(price) from test1; + _col0 + ------- + 21.4 + (1 row) + + c. Return the value of the input integer plus 2. + + **select add_two(4);** + + .. code-block:: + + select add_two(4); + _col0 + ------- + 6 + (1 row) + +.. |image1| image:: /_static/images/en-us_image_0000001295740088.png diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/function_&_udf_development_and_application/hive_udf_development_and_application.rst b/doc/component-operation-guide-lts/source/using_hetuengine/function_&_udf_development_and_application/hive_udf_development_and_application.rst new file mode 100644 index 0000000..96ee4ee --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/function_&_udf_development_and_application/hive_udf_development_and_application.rst @@ -0,0 +1,210 @@ +:original_name: mrs_01_1743.html + +.. _mrs_01_1743: + +Hive UDF Development and Application +==================================== + +You can customize functions to extend SQL statements to meet personalized requirements. These functions are called UDFs. + +This section describes how to develop and apply Hive UDFs. + +Developing Hive UDFs +-------------------- + +This sample implements one Hive UDF described in the following table. + +.. table:: **Table 1** Hive UDF + + ========== ===================================================== + Parameter Description + ========== ===================================================== + AutoAddOne Adds **1** to the input value and returns the result. + ========== ===================================================== + +.. note:: + + - A common Hive UDF must be inherited from **org.apache.hadoop.hive.ql.exec.UDF**. + + - A common Hive UDF must implement at least one **evaluate()**. The **evaluate** function supports overloading. + + - Currently, only the following data types are supported: + + - boolean, byte, short, int, long, float, and double + - Boolean, Byte, Short, Int, Long, Float, and Double + - List and Map + + UDFs, UDAFs, and UDTFs currently do not support complex data types other than the preceding ones. + + - Currently, Hive UDFs supports only less than or equal to five input parameters. UDFs with more than five input parameters will fail to be registered. + + - If the input parameter of a Hive UDF is **null**, the call returns **null** directly without parsing the Hive UDF logic. As a result, the UDF execution result may be inconsistent with the Hive execution result. + + - To add the **hive-exec-3.1.1** dependency package to the Maven project, you can obtain the package from the Hive installation directory. + + - (Optional) If the Hive UDF depends on a configuration file, you are advised to save the configuration file as a resource file in the **resources** directory so that it can be packed into the Hive UDF function package. + +#. Create a Maven project. Set **groupId** to **com.test.udf** and **artifactId** to **udf-test**. The two values can be customized based on the site requirements. + +#. Modify the **pom.xml** file as follows: + + .. code-block:: + + + 4.0.0 + com.test.udf + udf-test + 0.0.1-SNAPSHOT + + + + org.apache.hive + hive-exec + 3.1.1 + + + + + + + maven-shade-plugin + + + package + + shade + + + + + + maven-resources-plugin + + + copy-resources + package + + copy-resources + + + ${project.build.directory}/ + + + src/main/resources/ + false + + + + + + + + + + +#. Create the implementation class of the Hive UDF. + + .. code-block:: + + import org.apache.hadoop.hive.ql.exec.UDF; + + /** + * AutoAddOne + * + * @since 2020-08-24 + */ + public class AutoAddOne extends UDF { + public int evaluate(int data) { + return data + 1; + } + } + +#. Package the Maven project. The **udf-test-0.0.1-SNAPSHOT.jar** file in the **target** directory is the Hive UDF function package. + +Configuring Hive UDFs +--------------------- + +In configuration file **udf.properties**, add registration information in the "Function_name Class_path" format to each line. + +The following provides an example of registering four Hive UDFs in configuration file **udf.properties**: + +.. code-block:: + + booleanudf io.hetu.core.hive.dynamicfunctions.examples.udf.BooleanUDF + shortudf io.hetu.core.hive.dynamicfunctions.examples.udf.ShortUDF + byteudf io.hetu.core.hive.dynamicfunctions.examples.udf.ByteUDF + intudf io.hetu.core.hive.dynamicfunctions.examples.udf.IntUDF + +.. note:: + + - If the added Hive UDF registration information is incorrect, for example, the format is incorrect or the class path does not exist, the system ignores the incorrect registration information and prints the corresponding logs. + - If duplicate Hive UDFs are registered, the system will only register once and ignore the duplicate registrations. + - If the Hive UDF to be registered is the same as that already registered in the system, the system throws an exception and cannot be started properly. To solve this problem, you need to delete the Hive UDF registration information. + +Deploying Hive UDFs +------------------- + +To use an existing Hive UDF in HetuEngine, you need to upload the UDF function package, **udf.properties** file, and configuration file on which the UDF depends to the specified HDFS directory, for example, **/user/hetuserver/udf/**, and restart the HetuEngine compute instance. + +#. Create the **/user/hetuserver/udf/data/externalFunctions** directory, save the **udf.properties** file in the **/user/hetuserver/udf** directory, save the UDF function package in the **/user/hetuserver/udf/data/externalFunctions** directory, and save the configuration files on which the UDF depends in the **/user/hetuserver/udf/data** directory. + + - Upload the files on the HDFS page: + + a. Log in to FusionInsight Manager using the HetuEngine username and choose **Cluster** > **Services** > **HDFS**. + b. In the **Basic Information** area on the **Dashboard** page, click the link next to **NameNode WebUI**. + c. Choose **Utilities** > **Browse the file system** and click |image1| to create the **/user/hetuserver/udf/data/externalFunctions** directory. + d. Go to **/user/hetuserver/udf** and click |image2| to upload the **udf.properties** file. + e. Go to the **/user/hetuserver/udf/data/** directory and click |image3| to upload the configuration file on which the UDF depends. + f. Go to the **/user/hetuserver/udf/data/externalFunctions** directory and click |image4| to upload the UDF function package. + + - Use the HDFS CLI to upload the files. + + a. Log in to the node where the HDFS service client is located and switch to the client installation directory, for example, **/opt/client**. + + **cd /opt/client** + + b. Run the following command to configure environment variables: + + **source bigdata_env** + + c. If the cluster is in security mode, run the following command to authenticate the user. In normal mode, skip user authentication. + + **kinit** *HetuEngine* *username* + + Enter the password as prompted. + + d. Run the following commands to create directories and upload the prepared UDF function package, **udf.properties** file, and configuration file on which the UDF depends to the target directories: + + **hdfs dfs -mkdir** **/user/hetuserver/udf/data/externalFunctions** + + **hdfs dfs -put ./**\ *Configuration files on which the UDF depends* **/user/hetuserver/udf/data** + + **hdfs dfs -put ./udf.properties /user/hetuserver/udf** + + **hdfs dfs -put ./**\ *UDF function package* **/user/hetuserver/udf/data/externalFunctions** + +#. Restart the HetuEngine compute instance. + +Using Hive UDFs +--------------- + +Use a client to access a Hive UDF: + +#. Log in to the HetuEngine client. For details, see :ref:`Using the HetuEngine Client `. + +#. Run the following command to use a Hive UDF: + + **select AutoAddOne(1);** + + .. code-block:: + + select AutoAddOne(1); + _col0 + ------- + 2 + (1 row) + +.. |image1| image:: /_static/images/en-us_image_0000001295740228.png +.. |image2| image:: /_static/images/en-us_image_0000001349259325.png +.. |image3| image:: /_static/images/en-us_image_0000001349059877.png +.. |image4| image:: /_static/images/en-us_image_0000001296060032.png diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/function_&_udf_development_and_application/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/function_&_udf_development_and_application/index.rst new file mode 100644 index 0000000..b3916fd --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/function_&_udf_development_and_application/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_2338.html + +.. _mrs_01_2338: + +Function & UDF Development and Application +========================================== + +- :ref:`HetuEngine Function Plugin Development and Application ` +- :ref:`Hive UDF Development and Application ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + hetuengine_function_plugin_development_and_application + hive_udf_development_and_application diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_cluster_node_resource_configurations.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_cluster_node_resource_configurations.rst new file mode 100644 index 0000000..1312225 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_cluster_node_resource_configurations.rst @@ -0,0 +1,66 @@ +:original_name: mrs_01_1741.html + +.. _mrs_01_1741: + +Adjusting Cluster Node Resource Configurations +============================================== + +Scenario +-------- + +The default memory size and disk overflow path of HetuEngine are not the best. You need to adjust node resources in the cluster based on the actual service and server configuration of the cluster to achieve the optimal performance. + +Procedure +--------- + +#. Log in to FusionInsight Manager. + +#. Choose **Cluster** > **Services** > **HetuEngine** > **Configurations** > **All Configurations** and adjust the cluster node resource parameters by referring to :ref:`Table 1 `. + + .. _mrs_01_1741__en-us_topic_0000001173789314_table134641948145817: + + .. table:: **Table 1** Parameters for configuring cluster node resources + + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | Parameter | Default Value | Recommended Value | Description | Parameter File | + +=========================================================+==========================================================+========================================================================================================+========================================================================+========================================================+ + | yarn.hetuserver.engine.coordinator.memory | 5120 | At least 2 GB less than that of **yarn.scheduler.maximum-allocation-mb** | Memory size used by a Coordinator node | application.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | yarn.hetuserver.engine.coordinator.number-of-containers | 2 | 2 | Number of Coordinator nodes | application.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | yarn.hetuserver.engine.coordinator.number-of-cpus | 1 | At least two vCores less than **yarn.scheduler.maximum-allocation-vcores** | CPU vCores used by a Coordinator node | application.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | yarn.hetuserver.engine.worker.memory | 10240 | At least 2 GB less than that of **yarn.scheduler.maximum-allocation-mb** | Memory size used by a worker node | application.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | yarn.hetuserver.engine.worker.number-of-containers | 2 | Adjusted based on application requirements | Number of worker nodes | application.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | yarn.hetuserver.engine.worker.number-of-cpus | 1 | At least two vCores less than **yarn.scheduler.maximum-allocation-vcores** | CPU vCores used by a Worker node | application.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | **Xmx** size in the **extraJavaOptions** parameter | 8GB | Max (*Memory size used by a Worker node* - 30 GB, *Memory size used by a Worker node* x 0.7) | Maximum available memory of the worker JVM process | worker.jvm.config | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | query.max-memory-per-node | 5GB | Worker JVM x 0.7 | Maximum available memory of a Query node | worker.config.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | query.max-total-memory-per-node | 5GB | Worker JVM x 0.7 | Maximum available memory of a Query + System node | worker.config.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | memory.heap-headroom-per-node | 3 GB | Worker JVM x 0.3 | Maximum available memory of a system heap node | worker.config.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | **Xmx** size in the **extraJavaOptions** parameter | 4GB | Max (*Memory size used by a Coordinator node* - 30 GB, *Memory size used by a Coordinator node* x 0.7) | Maximum available memory of the Coordinator JVM process | coordinator.jvm.config | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | query.max-memory-per-node | 3GB | Coordinator JVM x 0.7 | Maximum memory that can be used for node query | coordinator.config.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | query.max-total-memory-per-node | 3GB | Coordinator JVM x 0.7 | Maximum available memory of a Query + System node | coordinator.config.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | memory.heap-headroom-per-node | 1GB | Coordinator JVM x 0.3 | Maximum available memory of a system heap node | coordinator.config.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | query.max-memory | 7GB | Sum(query.max-memory-per-node) x 0.7 | Maximum available memory of a Query cluster | worker.config.properties/coordinator.config.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | experimental.spiller-spill-path | CONTAINER_ROOT_PATH/tmp/hetuserver/hetuserver-sqlengine/ | One or more independent SSDs | Disk output file path | worker.config.properties/coordinator.config.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | experimental.max-spill-per-node | 10 GB | Sum(Available space of each node) x 50% | Available disk space for storing output files of all queries on a node | worker.config.properties/coordinator.config.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + | experimental.query-max-spill-per-node | 10 GB | 80% of the available disk space on a node | Available disk space for storing output files of a query on a node | worker.config.properties/coordinator.config.properties | + +---------------------------------------------------------+----------------------------------------------------------+--------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------+--------------------------------------------------------+ + +#. Click **Save**. + +#. Choose **Cluster** > **Services** > **HetuEngine** > **More** > **Restart Service** to restart the HetuEngine service for the parameters to take effect. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_execution_plan_cache.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_execution_plan_cache.rst new file mode 100644 index 0000000..81490ae --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_execution_plan_cache.rst @@ -0,0 +1,36 @@ +:original_name: mrs_01_1742.html + +.. _mrs_01_1742: + +Adjusting Execution Plan Cache +============================== + +Scenario +-------- + +HetuEngine provides the execution plan cache function. For the same query that needs to be executed for multiple times, this function reduces the time required for generating the execution plans for the same query. + +Procedure +--------- + +#. Log in to FusionInsight Manager. + +#. Choose **Cluster** > **Services** > **HetuEngine** > **Configurations** > **All Configurations** and adjust the execution plan cache parameters by referring to :ref:`Table 1 `. + + .. _mrs_01_1742__en-us_topic_0000001219350551_table0600139102317: + + .. table:: **Table 1** Execution plan cache parameters + + +----------------------------------+---------------+-----------------------------------------------------+-----------------------------------------------------------------------------------------------------+--------------------------------------------------------------------+ + | Parameter | Default Value | Recommended Value | Description | Parameter File | + +==================================+===============+=====================================================+=====================================================================================================+====================================================================+ + | hetu.executionplan.cache.enabled | false | true | Indicates whether to enable the global execution plan cache. | **coordinator.config.properties** and **worker.config.properties** | + +----------------------------------+---------------+-----------------------------------------------------+-----------------------------------------------------------------------------------------------------+--------------------------------------------------------------------+ + | hetu.executionplan.cache.limit | 20000 | Adjust the value based on application requirements. | Indicates the maximum number of execution plans that can be cached. | **coordinator.config.properties** and **worker.config.properties** | + +----------------------------------+---------------+-----------------------------------------------------+-----------------------------------------------------------------------------------------------------+--------------------------------------------------------------------+ + | hetu.executionplan.cache.timeout | 86400000 | Adjust the value based on application requirements. | Indicates the timeout interval of the cached execution plan since the last access, in milliseconds. | **coordinator.config.properties** and **worker.config.properties** | + +----------------------------------+---------------+-----------------------------------------------------+-----------------------------------------------------------------------------------------------------+--------------------------------------------------------------------+ + +#. Click **Save**. + +#. Choose **Cluster** > **Services** > **HetuEngine** > **More** > **Restart Service** to restart the HetuEngine service for the parameters to take effect. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_metadata_cache.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_metadata_cache.rst new file mode 100644 index 0000000..fbbe5f4 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_metadata_cache.rst @@ -0,0 +1,38 @@ +:original_name: mrs_01_1746.html + +.. _mrs_01_1746: + +Adjusting Metadata Cache +======================== + +Scenario +-------- + +When HetuEngine accesses the Hive data source, it needs to access the Hive metastore to obtain the metadata information. HetuEngine provides the metadata cache function. When the database or table of the Hive data source is accessed for the first time, the metadata information (database name, table name, table field, partition information, and permission information) of the database or table is cached, the Hive metastore does not need to be accessed again during subsequent access. If the table data of the Hive data source does not change frequently, the query performance can be improved to some extent. + +Procedure +--------- + +#. Log in to FusionInsight Manager. + +#. Choose **Cluster** > **Services** > **HetuEngine** > **Configurations** > **All Configurations** and adjust the metadata cache parameters by referring to :ref:`Table 1 `. + + .. _mrs_01_1746__en-us_topic_0000001173630766_table1201657173018: + + .. table:: **Table 1** Metadata cache parameters + + +---------------------------------------------------+----------------------------------------------------------------------------------------------+---------------+-----------------+ + | Parameter | Description | Default Value | Parameter File | + +===================================================+==============================================================================================+===============+=================+ + | hive.metastore-cache-ttl | Cache duration of the metadata of the co-deployed Hive data source. | 0s | hive.properties | + +---------------------------------------------------+----------------------------------------------------------------------------------------------+---------------+-----------------+ + | hive.metastore-cache-maximum-size | Maximum cache size of the metadata of the co-deployed Hive data source. | 10000 | hive.properties | + +---------------------------------------------------+----------------------------------------------------------------------------------------------+---------------+-----------------+ + | hive.metastore-refresh-interval | Interval for refreshing the metadata of the co-deployed Hive data source. | 1s | hive.properties | + +---------------------------------------------------+----------------------------------------------------------------------------------------------+---------------+-----------------+ + | hive.per-transaction-metastore-cache-maximum-size | Maximum cache size of the metadata for each transaction of the co-deployed Hive data source. | 1000 | hive.properties | + +---------------------------------------------------+----------------------------------------------------------------------------------------------+---------------+-----------------+ + +#. Click **Save**. + +#. Choose **Cluster** > **Services** > **HetuEngine** > **More** > **Restart Service** to restart the HetuEngine service for the parameters to take effect. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_the_yarn_service_configuration.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_the_yarn_service_configuration.rst new file mode 100644 index 0000000..8f750f4 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/adjusting_the_yarn_service_configuration.rst @@ -0,0 +1,38 @@ +:original_name: mrs_01_1740.html + +.. _mrs_01_1740: + +Adjusting the Yarn Service Configuration +======================================== + +Scenario +-------- + +HetuEngine depends on the resource allocation and control capabilities provided by Yarn. You need to adjust the Yarn service configuration based on the actual service and cluster server configuration to achieve the optimal performance. + +Procedure +--------- + +#. Log in to FusionInsight Manager. + +#. Choose **Cluster** > **Services** > **Yarn** > **Configurations** > **All Configurations** and set Yarn service parameters by referring to :ref:`Table 1 `. + + .. _mrs_01_1740__en-us_topic_0000001173949262_table49551729155011: + + .. table:: **Table 1** Yarn configuration parameters + + +------------------------------------------+---------------+--------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Recommended Value | + +==========================================+===============+==========================================================================================================================+ + | yarn.nodemanager.resource.memory-mb | 16384 | To achieve the optimal performance, set this parameter to 90% of the minimum physical memory of the node in the cluster. | + +------------------------------------------+---------------+--------------------------------------------------------------------------------------------------------------------------+ + | yarn.nodemanager.resource.cpu-vcores | 8 | To achieve the optimal performance, set this parameter to the minimum number of vCores of the node in the cluster. | + +------------------------------------------+---------------+--------------------------------------------------------------------------------------------------------------------------+ + | yarn.scheduler.maximum-allocation-mb | 65536 | To achieve the optimal performance, set this parameter to 90% of the minimum physical memory of the node in the cluster. | + +------------------------------------------+---------------+--------------------------------------------------------------------------------------------------------------------------+ + | yarn.scheduler.maximum-allocation-vcores | 32 | To achieve the optimal performance, set this parameter to the minimum number of vCores of the node in the cluster. | + +------------------------------------------+---------------+--------------------------------------------------------------------------------------------------------------------------+ + +#. Click **Save**. + +#. Choose **Cluster** > **Services** > **Yarn** > **More** > **Restart Service** to restart the Yarn service for the parameters to take effect. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/index.rst new file mode 100644 index 0000000..edbb1d1 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/index.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_1745.html + +.. _mrs_01_1745: + +HetuEngine Performance Tuning +============================= + +- :ref:`Adjusting the Yarn Service Configuration ` +- :ref:`Adjusting Cluster Node Resource Configurations ` +- :ref:`Adjusting Execution Plan Cache ` +- :ref:`Adjusting Metadata Cache ` +- :ref:`Modifying the CTE Configuration ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + adjusting_the_yarn_service_configuration + adjusting_cluster_node_resource_configurations + adjusting_execution_plan_cache + adjusting_metadata_cache + modifying_the_cte_configuration diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/modifying_the_cte_configuration.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/modifying_the_cte_configuration.rst new file mode 100644 index 0000000..dc566de --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_performance_tuning/modifying_the_cte_configuration.rst @@ -0,0 +1,38 @@ +:original_name: mrs_01_24181.html + +.. _mrs_01_24181: + +Modifying the CTE Configuration +=============================== + +Scenario +-------- + +If a table or common table expression (CTE) contained in a query appears multiple times and has the same projection and filter, you can enable the CTE reuse function to cache data in memory. In this way, you do not need to read data from disks for multiple times, reducing the time required for query execution. + +Procedure +--------- + +#. Log in to FusionInsight Manager. + +#. Choose **Cluster** > **Services** > **HetuEngine** > **Configurations** > **All Configurations** and configure related parameters by referring to :ref:`Table 1 `. + + .. _mrs_01_24181__en-us_topic_0000001219231083_table1201657173018: + + .. table:: **Table 1** CTE configuration parameters + + +---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------------+--------------------------------------------------------------------+ + | Parameter | Description | Recommended Value | Default Value | Parameter File | + +=======================================+=======================================================================================================================================================================+===================+===============+====================================================================+ + | optimizer.reuse-table-scan | Whether to enable the CTE table data reuse function. | true | false | **coordinator.config.properties** and **worker.config.properties** | + +---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------------+--------------------------------------------------------------------+ + | experimental.spill-reuse-tablescan | Whether to enable the function of spilling memory to disks during tablescan reuse. | true | false | **coordinator.config.properties** and **worker.config.properties** | + +---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------------+--------------------------------------------------------------------+ + | optimizer.cte-reuse-enabled | Whether to enable CTE reuse. If this function is enabled, CTE is executed only once irrespective of the number of times the same CTE is being used in the main query. | true | false | **coordinator.config.properties** and **worker.config.properties** | + +---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------------+--------------------------------------------------------------------+ + | dynamic-filtering-max-per-driver-size | Maximum volume of data that can be collected by each driver when dynamic filtering starts. | 100MB | 1MB | **coordinator.config.properties** and **worker.config.properties** | + +---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------------+--------------------------------------------------------------------+ + +#. Click **Save**. + +#. Choose **Cluster** > **Services** > **HetuEngine** > **More** > **Restart Service** and enter the password to restart the HetuEngine service for the parameters to take effect. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/creating_a_hetuengine_user.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/creating_a_hetuengine_user.rst new file mode 100644 index 0000000..f4624d2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/creating_a_hetuengine_user.rst @@ -0,0 +1,59 @@ +:original_name: mrs_01_1714.html + +.. _mrs_01_1714: + +Creating a HetuEngine User +========================== + +Scenarios +--------- + +Before using the HetuEngine service in a security cluster, a cluster administrator needs to create a user and grant operation permissions to the user to meet service requirements. + +HetuEngine users are classified into administrators and common users. The default HetuEngine administrator group is **hetuadmin**, and the user group of HetuEngine common users is **hetuuser**. + +- Users associated with the **hetuadmin** user group can obtain the O&M administrator permissions on the HetuEngine HSConsole web UI and HetuEngine compute instance web UI. +- Users associated with the **hetuuser** user group can obtain the SQL execution permission. + +If Ranger authentication is enabled and you need to configure the permissions to manage databases, tables, and columns of data sources for a user after it is created, see :ref:`Adding a Ranger Access Permission Policy for HetuEngine `. + +Prerequisites +------------- + +Before using the HetuEngine service, ensure that the tenant to be associated with the HetuEngine user has been planned and created. For details about how to create and use a tenant in an MRS cluster. + +.. note:: + + Common users can only perform operations on and view information about clusters of the tenants associated with them. + +Procedure +--------- + +**Creating a HetuEngine administrator** + +#. Log in to FusionInsight Manager. +#. Choose **System** > **Permission** > **User** > **Create**. +#. Enter a username, for example, **hetu_admin**. +#. Set **User Type** to **Human-machine**. +#. Set **New Password** and **Confirm Password**. +#. In the **User Group** area, click **Add** to add the **hive**, **hetuadmin**, **hadoop**, **hetuuser**, and **yarnviewgroup** user groups for the user. +#. In the **Primary Group** drop-down list, select **hive** as the primary group. +#. In the **Role** area, click **Add** to assign the **default**, **System_administrator**, and desired tenant role permissions to the user. +#. Click **OK**. + +**Creating a common HetuEngine user** + +#. Log in to FusionInsight Manager. +#. Choose **System** > **Permission** > **User** > **Create**. +#. Enter a username, for example, **hetu_test**. +#. Set **User Type** to **Human-machine**. +#. Set **New Password** and **Confirm Password**. +#. In the **User Group** area, click **Add** to add the **hetuuser** user group for the user. + + .. note:: + + - Ranger authentication is enabled for the HetuEngine service in the MRS cluster by default. HetuEngine common users only need to be associated with the **hetuuser** user group. If Ranger authentication is disabled, you must associate the user with the **hive** user group and set it as the primary group. Otherwise, the HetuEngine service may be unavailable. + - If Ranger authentication is enabled and you need to configure the permissions to manage databases, tables, and columns of data sources for a user after it is created, see :ref:`Adding a Ranger Access Permission Policy for HetuEngine `. + +#. In the **Role** area, click **Add** to assign the **default** or desired tenant role permissions to the user. +#. Click **OK**. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/configuring_permissions_for_tables,_columns,_and_databases.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/configuring_permissions_for_tables,_columns,_and_databases.rst new file mode 100644 index 0000000..cd3354c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/configuring_permissions_for_tables,_columns,_and_databases.rst @@ -0,0 +1,64 @@ +:original_name: mrs_01_2352.html + +.. _mrs_01_2352: + +Configuring Permissions for Tables, Columns, and Databases +========================================================== + +If a user needs to access HetuEngine tables or databases created by other users, the user needs to be granted with related permissions. HetuEngine supports permission control based on columns for strict permission control. If a user needs to access some columns in tables created by other users, the user must be granted the permission for columns. The following describes how to grant table, column, and database permissions to users by using the role management function of Manager. + +Procedure +--------- + +The operations for granting permissions on HetuEngine tables, columns, and databases are the same as those for Hive. + +.. note:: + + - Any permission for a table in the database is automatically associated with the HDFS permission for the database directory to facilitate permission management. When any permission for a table is canceled, the system does not automatically cancel the HDFS permission for the database directory to ensure performance. In this case, users can only log in to the database and view table names. + - When the query permission on a database is added to or deleted from a role, the query permission on tables in the database is automatically added to or deleted from the role. This mechanism is inherited from Hive. + - In HetuEngine, the name of a column of the **struct** type data cannot contain special characters, that is, characters other than letters, digits, and underscores (_). If the column name of the struct data type contains special characters, the column cannot be displayed on the FusionInsight Manager console when you grant permissions to roles on the **Role** page. + +Concepts +-------- + +:ref:`Table 1 ` describes the permission requirements when SQL statements are processed in HetuEngine. + +.. _mrs_01_2352__en-us_topic_0000001173789716_t61b1f27ae37c4015ac2596a8c29aa39e: + +.. table:: **Table 1** Using HetuEngine tables, columns, or data + + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario | Required Permission | + +===================================+=============================================================================================================================================+ + | DESCRIBE TABLE | Select | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | ANALYZE TABLE | Select and Insert | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | SHOW COLUMNS | Select | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | SHOW TABLE STATUS | Select | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | SHOW TABLE PROPERTIES | Select | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | SELECT | Select | + | | | + | | .. note:: | + | | | + | | To perform the SELECT operation on a view, you must have the **Select** permission on the view and the tables corresponding to the view. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | EXPLAIN | Select | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | CREATE VIEW | Select, Grant Of Select, and Create | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | CREATE TABLE | Create | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | ALTER TABLE ADD PARTITION | Insert | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | INSERT | Insert | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | INSERT OVERWRITE | Insert and Delete | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | ALTER TABLE DROP PARTITION | The table-level Alter and Delete, and column-level Select permissions need to be granted. | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | ALTER DATABASE | Hive Admin Privilege | + +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/creating_a_hetuengine_role.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/creating_a_hetuengine_role.rst new file mode 100644 index 0000000..df89aee --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/creating_a_hetuengine_role.rst @@ -0,0 +1,48 @@ +:original_name: mrs_01_2350.html + +.. _mrs_01_2350: + +Creating a HetuEngine Role +========================== + +The system administrator can create and set a HetuEngine role on FusionInsight Manager. The HetuEngine role can be configured with the HetuEngine administrator permission or the permission of performing operations on the table data. + +Creating a database with Hive requires users to join in the Hive group, without granting a role. Users have all permissions on the databases or tables created by themselves in Hive or HDFS. They can create tables, select, delete, insert, or update data, and grant permissions to other users to allow them to access the tables and corresponding HDFS directories and files. The created databases or tables are saved in the **/user/hive/warehouse** directory of HDFS by default. + +Procedure +--------- + +#. Log in to FusionInsight Manager. + +#. Choose **System** > **Permission** > **Role**. + +#. Click **Create Role**, and set **Role Name** and **Description**. + +#. In the **Configure Resource Permission** area, choose *Name of the desired cluster* > **Hive** and set role permissions. For details, see :ref:`Table 1 `. + + - **Hive Admin Privilege**: Hive administrator permission. + - **Hive Read Write Privileges**: Hive data table management permission, which is the operation permission to set and manage the data of created tables. + + .. note:: + + - Hive role management supports the Hive administrator permission, and the permissions of accessing tables and views, without granting the database permission. + - The permissions of the Hive administrator do not include the permission to manage HDFS. + - If there are too many tables in the database or too many files in tables, the permission granting may last a while. For example, if a table contains 10,000 files, the permission granting lasts about 2 minutes. + + .. _mrs_01_2350__en-us_topic_0000001173631158_en-us_topic_0254454613_table1148121718119: + + .. table:: **Table 1** Setting a role + + +------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+ + | Scenario | Role Authorization | + +==========================================================================================+=========================================================================================================+ + | Setting the permission to query a table of another user in the default database | a. In the **View Name** area, click **Hive Read Write Privileges**. | + | | b. Click the name of the specified database in the database list. Tables in the database are displayed. | + | | c. In the **Permission** column of a specified table, choose **Select**. | + +------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+ + | Setting the permission to import data to a table of another user in the default database | a. In the **View Name** area, click **Hive Read Write Privileges**. | + | | b. Click the name of the specified database in the database list. Tables in the database are displayed. | + | | c. In the **Permission** column of the specified indexes, select **Delete** and **Insert**. | + +------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+ + +5. Click **OK**. Return to the **Role** page. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/index.rst new file mode 100644 index 0000000..aa82051 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_1724.html + +.. _mrs_01_1724: + +HetuEngine MetaStore-based Permission Control +============================================= + +- :ref:`Overview ` +- :ref:`Creating a HetuEngine Role ` +- :ref:`Configuring Permissions for Tables, Columns, and Databases ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + overview + creating_a_hetuengine_role + configuring_permissions_for_tables,_columns,_and_databases diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/overview.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/overview.rst new file mode 100644 index 0000000..28e7245 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/overview.rst @@ -0,0 +1,83 @@ +:original_name: mrs_01_1725.html + +.. _mrs_01_1725: + +Overview +======== + +Constraints: This parameter applies only to the Hive data source. + +When multiple HetuEngine clusters are deployed for collaborative computing, the metadata is centrally managed by the management cluster. Data computing is performed in all clusters. The user permission for accessing HetuEngine clusters must be configured in the management cluster. Users who belong to the Hive user group and share the same name are added to all compute instances. + +MetaStore Permission +-------------------- + +Similar to Hive, HetuEngine is a data warehouse framework built on Hadoop, providing storage of structured data like SQL. + +Permissions in a cluster must be assigned to roles which are bound to users or user groups. Users can obtain permissions only by binding a role or joining a group that is bound with a role. + +Permission Management +--------------------- + +HetuEngine permission management is performed by the permission system to manage users' operations on the database, ensuring that different users can operate databases independently and securely. A user can operate another user's tables and databases only with the corresponding permissions. Otherwise, operations will be rejected. + +HetuEngine permission management integrates the functions of Hive permission management. MetaStore service of Hive and the function of granting permissions on the web page are required to enable the HetuEngine permission management. + +- Granting permissions on the web page: HetuEngine only supports granting permissions on the web page. On Manager, choose **System** > **Permission** to add or delete a user, user group, or a role, and to grant permissions or cancel permissions. +- Obtaining and judging a service: When the DDL and DML commands are received from the client, HetuEngine will obtain the client user's permissions on database information from MetaStore, and check whether the required permissions are included. If the required permissions have been obtained, the user's operations are allowed. If the permissions are not obtained, the user's operation will be rejected. After the MetaStore permissions are checked, ACL permission also needs to be checked on HDFS. + +HetuEngine Permission Model +--------------------------- + +If a user uses HetuEngine to perform SQL query, the user must be granted with permissions of HetuEngine databases and tables (include external tables and views). The complete permission model of HetuEngine consists of the metadata permission and HDFS file permission. Permissions required to use a database or a table are just one type of HetuEngine permission. + +- Metadata permissions + + Metadata permissions are controlled at the metadata level. Similar to traditional relational databases, the HetuEngine database contains the CREATE and SELECT permissions. Tables and columns contain the SELECT, INSERT, UPDATE, and DELETE permissions. HetuEngine also supports the permissions of OWNERSHIP and ADMIN. + +- Data file permissions (that is, HDFS file permissions) + + HetuEngine database and table files are stored in HDFS. The created databases or tables are saved in the **/user/hive/warehouse** directory of HDFS by default. The system automatically creates subdirectories named after database names and database table names. To access a database or a table, the corresponding file permissions (READ, WRITE, and EXECUTE) on HDFS are required. + +To perform various operations on HetuEngine databases or tables, you need to associate the metadata permission and the HDFS file permission. For example, to query HetuEngine data tables, you need to associate the metadata permission SELECT with the READ and EXECUTE permissions on HDFS files. + +To use the management function of FusionInsight Manager GUI to manage the permissions of HetuEngine databases and tables, you only need to configure the metadata permission, and the system will automatically associate and configure the HDFS file permission. In this way, operations on the interface are simplified, improving efficiency. + +HetuEngine Application Scenarios and Related Permissions +-------------------------------------------------------- + +A user needs to join in the Hive group if a database is created using the HetuEngine service, and role authorization is not required. Users have all permissions on the databases or tables created by themselves in Hive or HDFS. They can create tables, select, delete, insert, or update data, and grant permissions to other users to allow them to access the tables and corresponding HDFS directories and files. + +A user can access the tables or database only with permissions. Permissions required for the user vary depending on different HetuEngine scenarios. + +.. table:: **Table 1** Typical HetuEngine scenarios and required permissions + + +------------------------------------------------+-------------------------------------------------------------+ + | Scenario | Required Permission | + +================================================+=============================================================+ + | Using HetuEngine tables, columns, or databases | Permissions required in different scenarios are as follows: | + | | | + | | - To create a table, the CREATE permission is required. | + | | - To query data, the SELECT permission is required. | + | | - To insert data, the INSERT permission is required. | + +------------------------------------------------+-------------------------------------------------------------+ + +In some special HetuEngine scenarios, other permissions must be configured separately. + +.. table:: **Table 2** Typical HetuEngine authentication scenarios and required permissions + + +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario | Required Permission | + +==================================================================================================================================================================================================================================+==========================================================================================================================================================================================================================================================================+ + | Creating HetuEngine databases, tables, and foreign tables, or adding partitions to created tables or foreign tables when data files specified by Hive users are saved to other HDFS directories except **/user/hive/warehouse**. | The directory must exist, the client user must be the owner of the directory, and the user must have the READ, WRITE, and EXECUTE permissions on the directory. The user must have the READ and EXECUTE permissions of all the upper-layer directories of the directory. | + +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Performing operations on all databases and tables in Hive | The user must be added to the **supergroup** user group, and be assigned the ADMIN permission. | + +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Enabling MetaStore Authentication +--------------------------------- + +#. Log in to FusionInsight Manager. +#. Choose **Cluster** > **Services** > **HetuEngine** > **More** > **Disable Ranger**. +#. Choose **Cluster** > **Services** > **HetuEngine** > **More** > **Restart Service**. +#. Restart the compute instance on HSConsole. For details, see :ref:`Managing a HetuEngine Compute Instance `. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_permission_management_overview.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_permission_management_overview.rst new file mode 100644 index 0000000..1315b00 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_permission_management_overview.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_1722.html + +.. _mrs_01_1722: + +HetuEngine Permission Management Overview +========================================= + +HetuEngine supports permission control for clusters in security mode. For clusters in non-security mode, permission control is not performed. + +In security mode, HetuEngine provides two permission control modes: Ranger and MetaStore. The Ranger mode is used by default. + +The following table lists the differences between Ranger and MetaStore. Both Ranger and MetaStore support user, user group, and role authentication. + +.. table:: **Table 1** Differences between Ranger and MetaStore + + +-------------------------+------------------+-----------------------------------------------------------------+-----------------------------------------------------------------------------------+ + | Permission Control Mode | Permission Model | Supported Data Source | Description | + +=========================+==================+=================================================================+===================================================================================+ + | Ranger | PBAC | Hive, HBase, Elasticsearch, GaussDB, HetuEngine, and ClickHouse | Row filtering, column masking, and fine-grained permission control are supported. | + +-------------------------+------------------+-----------------------------------------------------------------+-----------------------------------------------------------------------------------+ + | MetaStore | RBAC | Hive | ``-`` | + +-------------------------+------------------+-----------------------------------------------------------------+-----------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_ranger-based_permission_control.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_ranger-based_permission_control.rst new file mode 100644 index 0000000..56604e9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_ranger-based_permission_control.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_1723.html + +.. _mrs_01_1723: + +HetuEngine Ranger-based Permission Control +========================================== + +Newly installed clusters use Ranger for authentication by default. System administrators can use Ranger to configure the permissions to manage databases, tables, and columns of data sources for HetuEngine users. For details, see :ref:`Adding a Ranger Access Permission Policy for HetuEngine `. + +If a cluster is upgraded from an earlier version or Ranger authentication is manually disabled for a cluster, enable Ranger authentication again by referring to :ref:`Enabling Ranger Authentication `. + +.. _mrs_01_1723__en-us_topic_0000001219351179_section622553792416: + +Enabling Ranger Authentication +------------------------------ + +#. Log in to FusionInsight Manager. +#. Choose **Cluster** > **Services** > **HetuEngine** > **More** > **Enable Ranger**. +#. Choose **Cluster** > **Services** > **HetuEngine** > **More** > **Restart Service**. +#. Restart the compute instance on HSConsole. For details, see :ref:`Managing a HetuEngine Compute Instance `. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/index.rst new file mode 100644 index 0000000..5a80a2e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/index.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_1721.html + +.. _mrs_01_1721: + +HetuEngine Permission Management +================================ + +- :ref:`HetuEngine Permission Management Overview ` +- :ref:`Creating a HetuEngine User ` +- :ref:`HetuEngine Ranger-based Permission Control ` +- :ref:`HetuEngine MetaStore-based Permission Control ` +- :ref:`Permission Principles and Constraints ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + hetuengine_permission_management_overview + creating_a_hetuengine_user + hetuengine_ranger-based_permission_control + hetuengine_metastore-based_permission_control/index + permission_principles_and_constraints diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/permission_principles_and_constraints.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/permission_principles_and_constraints.rst new file mode 100644 index 0000000..1702cb5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/permission_principles_and_constraints.rst @@ -0,0 +1,27 @@ +:original_name: mrs_01_1728.html + +.. _mrs_01_1728: + +Permission Principles and Constraints +===================================== + +General Constraints +------------------- + +- Access data sources in the same cluster using HetuEngine + + If Ranger authentication is enabled for HetuEngine, the PBAC permission policy of Ranger is used for authentication. + + If Ranger authentication is disabled for HetuEngine, the RBAC permission policy of MetaStore is used for authentication. + +- Access data sources in different clusters using HetuEngine + + The permission policy is controlled by the permissions of the HetuEngine client and the data source. (In Hive scenarios, it depends on HDFS.) + +- HetuEngine users do not support the **supergroup** user group. + +- When querying a view, you only need to grant the select permission on the target view. When querying a join table using a view, you need to grant the select permission on the view and table. + +.. note:: + + When the permission control type of HetuEngine is changed, the HetuEngine service, including the HetuEngine compute instance running on Yarn, needs to be restarted. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/index.rst new file mode 100644 index 0000000..3bf3931 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/index.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_1710.html + +.. _mrs_01_1710: + +Using HetuEngine +================ + +- :ref:`Using HetuEngine from Scratch ` +- :ref:`HetuEngine Permission Management ` +- :ref:`Creating HetuEngine Compute Instances ` +- :ref:`Configuring Data Sources ` +- :ref:`Managing Data Sources ` +- :ref:`Managing Compute Instances ` +- :ref:`Using the HetuEngine Client ` +- :ref:`Using the HetuEngine Cross-Source Function ` +- :ref:`Using HetuEngine Cross-Domain Function ` +- :ref:`Using a Third-Party Visualization Tool to Access HetuEngine ` +- :ref:`Function & UDF Development and Application ` +- :ref:`Introduction to HetuEngine Logs ` +- :ref:`HetuEngine Performance Tuning ` +- :ref:`Common Issues About HetuEngine ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + using_hetuengine_from_scratch + hetuengine_permission_management/index + creating_hetuengine_compute_instances + configuring_data_sources/index + managing_data_sources/index + managing_compute_instances/index + using_the_hetuengine_client + using_the_hetuengine_cross-source_function/index + using_hetuengine_cross-domain_function/index + using_a_third-party_visualization_tool_to_access_hetuengine/index + function_&_udf_development_and_application/index + introduction_to_hetuengine_logs + hetuengine_performance_tuning/index + common_issues_about_hetuengine/index diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/introduction_to_hetuengine_logs.rst b/doc/component-operation-guide-lts/source/using_hetuengine/introduction_to_hetuengine_logs.rst new file mode 100644 index 0000000..f1d3fc8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/introduction_to_hetuengine_logs.rst @@ -0,0 +1,142 @@ +:original_name: mrs_01_1744.html + +.. _mrs_01_1744: + +Introduction to HetuEngine Logs +=============================== + +Log Description +--------------- + +**Log paths:** + +The HetuEngine logs are stored in **/var/log/Bigdata/hetuengine/** and **/var/log/Bigdata/audit/hetuengine/**. + +**Log archiving rules**: + +Log archiving rules use the FixedWindowRollingPolicy policy. The maximum size of a single file and the maximum number of log archive files can be configured. The rules are as follows: + +- When the size of a single file exceeds the default maximum value, a new compressed archive file is generated. The naming rule of the compressed archive log file is as follows: **.\ *[ID]*.log.gz. +- When the number of log archive files reaches the maximum value, the earliest log file is deleted. + +By default, the maximum size of an audit log file is 30 MB, and the maximum number of log archive files is 20. + +By default, the maximum size of a run log file is 100 MB, and the maximum number of log archive files is 20. + +To change the maximum size of a single run log file or audit log file or change the maximum number of log archive files of an instance, perform the following operations: + +#. Log in to Manager. +#. Choose **Cluster** > **Services** > **HetuEngine** > **Configurations** > **All Configurations**. +#. In the parameter list of log levels, search for **logback.xml** to view the current run log and audit log configurations of HSBroker, HSConsole, and HSFabric. +#. Select the configuration item to be modified and modify it. +#. Click **Save**, and then click **OK**. The configuration automatically takes effect after about 30 seconds. + +.. table:: **Table 1** Log list + + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | Type | Name | Description | + +========================================+==========================================================================================================================+============================================================================================+ + | Installation, startup and stopping log | /var/log/Bigdata/hetuengine/hsbroker/prestart.log | HSBroker pre-processing script logs before startup | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsbroker/start.log | HSBroker Spring Boot startup logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsbroker/stop.log | HSBroker stop logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsbroker/postinstall.log | HSBroker post-installation logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsconsole/prestart.log | HSConsole pre-processing script logs before startup | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsconsole/start.log | HSConsole Spring Boot startup logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsconsole/stop.log | HSConsole stop logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsconsole/postinstall.log | HSConsole post-installation logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsfabric/prestart.log | HSFabric preprocessing script logs before startup | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsfabric/start.log | HSFabric startup logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsfabric/stop.log | HSFabric stop logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsfabric/postinstall.log | HSFabric post-installation logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | Run log | /var/log/Bigdata/hetuengine/hsbroker/hsbroker.log | HSBroker run logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsconsole/hsconsole.log | HSConsole run logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsfabric/hsfabric.log | HSFabric run logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | hdfs://hacluster/hetuserverhistory/*Tenant*/*Coordinator or worker*/application_ID/container_ID/yyyyMMdd/server.log | Run logs of the HetuEngine compute instance | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | Status check log | /var/log/Bigdata/hetuengine/hsbroker/service_check.log | HSBroker health check logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsbroker/service_getstate.log | HSBroker status check logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/availability-check.log | Status check logs indicating whether the HetuEngine service is available | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsconsole/service_getstate.log | HSConsole status check logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsfabric/service_getstate.log | HSFabric status check logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | Audit log | /var/log/Bigdata/audit/hetuengine/hsbroker/hsbroker-audit.log | HSBroker audit logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/audit/hetuengine/hsconsole/hsconsole-audit.log | HSConsole audit logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | hdfs://hacluster/hetuserverhistory/*Tenant*/coordinator/application_ID/container_ID/yyyyMMdd/hetuserver-engine-audit.log | Audit logs of the HetuEngine compute instance | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/audit/hetuengine/hsfabric/hsfabric-audit.log | HSFabric audit logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | Clean log | /var/log/Bigdata/hetuengine/hsbroker/cleanup.log | HSBroker cleanup script logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsconsole/cleanup.log | HSConsole cleanup script logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsfabric/cleanup.log | HSFabric cleanup script logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | Initialization log | /var/log/Bigdata/hetuengine/hsbroker/hetupg.log | HSBroker metadata initialization logs | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/hsconsole/hetupg.log | HSConsole connection metadata logs. | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | | /var/log/Bigdata/hetuengine/ranger-presto-plugin-enable.log | Operation logs generated when the Ranger plug-in is integrated into the HetuEngine kernel. | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + +Log Level +--------- + +:ref:`Table 2 ` describes the log levels provided by HetuEngine. The priorities of log levels are OFF, ERROR, WARN, INFO, and DEBUG in descending order. Logs whose levels are higher than or equal to the specified level are printed. The number of printed logs decreases as the specified log level increases. + +.. _mrs_01_1744__en-us_topic_0000001219351035_table589895023911: + +.. table:: **Table 2** Log levels + + +-------+------------------------------------------------------------------------------------------+ + | Level | Description | + +=======+==========================================================================================+ + | OFF | Logs of this level record no logs. | + +-------+------------------------------------------------------------------------------------------+ + | ERROR | Logs of this level record error information about the current event processing. | + +-------+------------------------------------------------------------------------------------------+ + | WARN | Logs of this level record exception information about the current event processing. | + +-------+------------------------------------------------------------------------------------------+ + | INFO | Logs of this level record normal running status information about the system and events. | + +-------+------------------------------------------------------------------------------------------+ + | DEBUG | Logs of this level record the system information and system debugging information. | + +-------+------------------------------------------------------------------------------------------+ + +To change the run log or audit log level of an instance, perform the following steps: + +#. Log in to FusionInsight Manager. +#. Choose **Cluster** > **Services** > **HetuEngine** > **Configurations** > **All Configurations**. +#. In the parameter list of log levels, search for **logback.xml** to view the current run log and audit log levels of HSBroker, HSConsole, and HSFabric. +#. Select a desired log level. +#. Click **Save**, and then click **OK**. The configuration automatically takes effect after about 30 seconds. + +To change the HetuEngine Coordinator/Worker log level, perform the following steps: + +#. Log in to FusionInsight Manager. +#. Choose **Cluster** > **Services** > **HetuEngine** > **Configurations** > **All Configurations**. +#. In the parameter list of log levels, search for **log.properties** to view the current log levels. +#. Select a desired log level. +#. Click **Save**, and then click **OK**. Wait until the operation is successful. +#. Choose **Cluster** > **Services** > **HetuEngine** > **Instance**, click the HSBroker instance in the role list, and choose **More** > **Restart Instance**. +#. After the HSBroker instance is restarted, choose **Cluster** > **Services** > **HetuEngine**. On the overview page, click the link next to **HSConsole WebUI** to go to the compute instance page. +#. Select a compute instance and click **Stop**. After the instance is stopped, click **Start** to restart it. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/adjusting_the_number_of_worker_nodes.rst b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/adjusting_the_number_of_worker_nodes.rst new file mode 100644 index 0000000..5dafa9f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/adjusting_the_number_of_worker_nodes.rst @@ -0,0 +1,71 @@ +:original_name: mrs_01_2320.html + +.. _mrs_01_2320: + +Adjusting the Number of Worker Nodes +==================================== + +Scenarios +--------- + +On the HetuEngine web UI, you can adjust the number of worker nodes for a compute instance. In this way, resources can be expanded for the compute instance when resources are insufficient and released when the resources are idle. The number of workers can be adjusted manually or automatically. + +Prerequisites +------------- + +You have created a user for accessing the HetuEngine web UI. For details, see :ref:`Creating a HetuEngine User `. + +.. note:: + + - When an instance is being scaled in or out, the original services are not affected and the instance can still be used. + - Instance scale-in/out is delayed to implement smooth adjustment of resource consumption within a long period of time. It cannot respond to the requirements of running SQL tasks for available resources in real time. + - After the instance scale-in/out function is enabled, restarting the HSBroker and Yarn services affects the scale-in/out function. If you need to restart the services, you are advised to disable the instance scale-in/out function first. + - Before scaling out a compute instance, ensure that the current queue has sufficient resources. Otherwise, the scale-out cannot reach the expected result and subsequent scale-in operations will be affected. + - To perform manual scale-in/out, log in to Manager, choose **HetuEngine** > **Configurations** > **All Configurations**, search for **application.customized.properties**, and add the **yarn.hetuserver.engine.flex.timeout.sec** parameter. The default value is **300** (in seconds). + +Procedure +--------- + +#. Log in to FusionInsight Manager as a user who can access the HetuEngine web UI and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. +#. In the **Basic Information** area on the **Dashboard** tab page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Click **Compute Instance**. +#. Locate the row that contains the target instance, and click **Configure** in the **Operation** column. +#. If manual scale-in/out is required, change the number of workers on the configuration page and click **OK**. The compute instance enters the **SCALING OUT** or **SCALING IN** state. After the scale-in/out is complete, the compute instance status changes to **RUNNING**. +#. If automatic scale-in/out is required, choose **Configure Instance** > **Advanced Configuration** and click the **Scaling** switch. + + - **OFF**: Disable dynamic scale-in/out. + + - **ON**: Enable dynamic scale-in/out. For details, see :ref:`Table 1 `. :ref:`Figure 1 ` shows the configuration page. + + .. _mrs_01_2320__en-us_topic_0000001219029547_table10789151105917: + + .. table:: **Table 1** Parameters for dynamic scale-in/out + + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Example Value | + +===========================+===========================================================================================================================================================+===============+ + | Scale-out Threshold | When the average value of the instance resource usage in the scale-in/out decision-making period exceeds the threshold, the instance starts to scale out. | 0.9 | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Scale-out Size | Number of Workers to be added each time when the instance starts to scale out. | 1 | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Scale-out Decision Period | Interval for determining whether to scale out an instance. Unit: second | 200 | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Scale-in Threshold | When the average value of the instance resource usage in the scale-in/out decision-making period exceeds the threshold, the instance starts to scale in. | 0.1 | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Scale-in Size | Number of Workers to be reduced each time when the instance starts to scale in. | 1 | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Scale-in Decision Period | Interval for determining whether to scale in an instance. Unit: second | 300 | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Load Collection Period | Interval for collecting instance load information. Unit: second | 10 | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Scale-out Timeout Period | Timeout period of the scale-out operation. Unit: second | 400 | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + | Scale-in Timeout Period | Timeout period of the scale-in operation. Unit: second | 600 | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ + + .. _mrs_01_2320__en-us_topic_0000001219029547_fig1841055911467: + + .. figure:: /_static/images/en-us_image_0000001295899852.png + :alt: **Figure 1** Scaling out/in an instance + + **Figure 1** Scaling out/in an instance diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/configuring_resource_groups.rst b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/configuring_resource_groups.rst new file mode 100644 index 0000000..6cd0c11 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/configuring_resource_groups.rst @@ -0,0 +1,330 @@ +:original_name: mrs_01_1732.html + +.. _mrs_01_1732: + +Configuring Resource Groups +=========================== + +Resource Group Introduction +--------------------------- + +The resource group mechanism controls the overall query load of the instance from the perspective of resource allocation and implements queuing policies for queries. Multiple resource groups can be created under a compute instance resource, and each submitted query is assigned to a specific resource group for execution. Before a resource group executes a new query, it checks whether the resource load of the current resource group exceeds the amount of resources allocated to it by the instance. If it is exceeded, new incoming queries are blocked, placed in a queue, or even rejected directly. However, the resource component does not cause the running query to fail. + +Application Scenarios of Resource Groups +---------------------------------------- + +Resource groups are used to manage resources in compute instances. Different resource groups are allocated to different users and queries to isolate resources. This prevents a single user or query from exclusively occupying resources in the compute instance. In addition, the weight and priority of resource components can be configured to ensure that important tasks are executed first. :ref:`Table 1 ` describes the typical application scenarios of resource groups. + +.. _mrs_01_1732__en-us_topic_0000001173631190_table14837640204918: + +.. table:: **Table 1** Typical application scenarios of resource groups + + +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Typical Scenarios | Solution | + +=========================================================================================================================================================================================================================================+====================================================================================================================================================================================================================================================================================================================================+ + | As the number of business teams using the compute instance increases, there is no resource when a team's task becomes more important and does not want to execute a query. | Allocate a specified resource group to each team. Important tasks are assigned to resource groups with more resources. When the sum of the proportions of sub-resource groups is less than or equal to 100%, the resources of a queue cannot be preempted by other resource groups. This is similar to static resource allocation. | + +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | When the instance resource load is high, two users submit a query at the same time. At the beginning, both queries are queuing. When there are idle resources, the query of a specific user can be scheduled to obtain resources first. | Two users are allocated with different resource groups. Important tasks can be allocated to resource groups with higher weights or priorities. The scheduling policy is configured by schedulingPolicy. Different scheduling policies have different resource allocation sequences. | + +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | For ad hoc queries and batch queries, resources can be allocated more properly based on different SQL types. | You can match different resource groups for different query types, such as EXPLAIN, INSERT, SELECT, and DATA_DEFINITION, and allocate different resources to execute the query. | + +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Enabling a Resource Group +------------------------- + +When creating a compute instance, add custom configuration parameters to the **resource-groups.json** file. For details, see :ref:`3.e ` in :ref:`Creating HetuEngine Compute Instances `. + +.. _mrs_01_1732__en-us_topic_0000001173631190_section112695622518: + +Resource Group Properties +------------------------- + +For details about how to configure resource group attributes, see :ref:`Table 2 `. + +.. _mrs_01_1732__en-us_topic_0000001173631190_table1961772415811: + +.. table:: **Table 2** Resource group properties + + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration Item | Mandatory | Description | + +=======================+=======================+========================================================================================================================================================================================================================================================================================================================================================================================+ + | name | Yes | Resource group name | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | maxQueued | Yes | Maximum number of queued queries. When this threshold is reached, new queries will be rejected. | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hardConcurrencyLimit | Yes | Maximum number of running queries. | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | softMemoryLimit | No | Maximum memory usage of a resource group. When the memory usage reaches this threshold, new tasks are queued. The value can be an absolute value (for example, 10 GB) or a percentage (for example, 10% of the cluster memory). | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | softCpuLimit | No | The CPU time that can be used in a period (see the **cpuQuotaPeriod** parameter in :ref:`Global Attributes `). You must also specify the **hardCpuLimit** parameter. When the threshold is reached, the CPU resources occupied by the query that occupies the maximum CPU resources in the resource group are reduced. | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hardCpuLimit | No | Maximum CPU time that can be used in a period. | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | schedulingPolicy | No | The scheduling policy for a specific query from the queuing state to the running state | + | | | | + | | | - fair (default) | + | | | | + | | | When multiple sub-resource groups in a resource group have queuing queries, the sub-resource groups obtain resources in turn based on the defined sequence. The query of the same sub-resource group obtains resources based on the first-come-first-executed rule. | + | | | | + | | | - weighted_fair | + | | | | + | | | The **schedulingWeight** attribute is configured for each resource group that uses this policy. Each sub-resource group calculates a ratio: *Number of queried sub-resource groups*/Scheduling weight. A sub-resource group with a smaller ratio obtains resources first. | + | | | | + | | | - weighted | + | | | | + | | | The default value is **1**. A larger value of **schedulingWeight** indicates that resources are obtained earlier. | + | | | | + | | | - query_priority | + | | | | + | | | All sub-resource groups must be set with **query_priority**. Resources are obtained in the sequence specified by **query_priority**. | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | schedulingWeight | No | Weight of the group. For details, see **schedulingPolicy**. The default value is **1**. | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | jmxExport | No | If this parameter is set to **true**, group statistics are exported to the JMX for monitoring. The default value is **false**. | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | subGroups | No | Subgroup list | + +-----------------------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Selector Rules +-------------- + +The selector matches resource groups in sequence. The first matched resource group is used. Generally, you are advised to configure a default resource group. If no default resource group is configured and other resource group selector conditions are not met, the query will be rejected. For details about how to set selector rule parameters, see :ref:`Table 3 `. + +.. _mrs_01_1732__en-us_topic_0000001173631190_table6836551111718: + +.. table:: **Table 3** Selector rules + + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration Item | Mandatory | Description | + +=======================+=======================+===========================================================================================================================================================================================================================================================================+ + | user | No | Regular expression for matching the user name. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | source | No | Data source to be matched with, such as JDBC, HBase, and Hive. For details, see the value of **--source** in :ref:`Configuration of Selector Attributes `. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | queryType | No | Task types: | + | | | | + | | | - **DATA_DEFINITION**: indicates that you can modify, create, or delete the metadata of schemas, tables, and views, and manage the query of prepared statements, permissions, sessions, and transactions. | + | | | - **DELETE**: indicates the DELETE queries. | + | | | - **DESCRIBE**: indicates the DESCRIBE, DESCRIBE INPUT, DESCRIBE OUTPUT, and SHOW queries. | + | | | - **EXPLAIN**: indicates the EXPLAIN queries. | + | | | - **INSERT**: indicates the INSERT and CREATE TABLE AS queries. | + | | | - **SELECT**: indicates the SELECT queries. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | clientTags | No | Match client tag to be matched with. Each tag must be in the tag list of the task submitted by the user. For details, see the value of **--client-tags** in :ref:`Configuration of Selector Attributes `. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | group | Yes | The resource group with running queries | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _mrs_01_1732__en-us_topic_0000001173631190_section6811173510322: + +Global Attributes +----------------- + +For details about how to configure global attributes, see :ref:`Table 4 `. + +.. _mrs_01_1732__en-us_topic_0000001173631190_table159281244203010: + +.. table:: **Table 4** Global attributes + + +--------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration Item | Mandatory | Description | + +====================+===========+=====================================================================================================================================================================================================================================+ + | cpuQuotaPeriod | No | Time range during which the CPU quota takes effect. This parameter is used together with **softCpuLimit** and **hardCpuLimit** in :ref:`Resource Group Properties `. | + +--------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _mrs_01_1732__en-us_topic_0000001173631190_section1347672818362: + +Configuration of Selector Attributes +------------------------------------ + +The data source name (**source**) can be set as follows: + +- **CLI**: Use the **--source** option. +- **JDBC**: Set the ApplicationName client information property on the Connection instance. + +The client tag (**clientTags**) can be configured as follows: + +- **CLI**: Use the **--client-tags** option. +- **JDBC**: Set the **ClientTags client info** property on the Connection instance. + +Configuration Example +--------------------- + +.. _mrs_01_1732__en-us_topic_0000001173631190_fig89011425373: + +.. figure:: /_static/images/en-us_image_0000001296219396.png + :alt: **Figure 1** Configuration example + + **Figure 1** Configuration example + +As shown in :ref:`Figure 1 `. + +- For the **global** resource group, a maximum of 100 queries can be executed at the same time. 1000 queries are in the queuing state. The **global** resource group has three sub-resource groups: **data_definition**, **adhoc**, and **pipeline**. +- Each user in the **pipeline** resource group can run a maximum of five queries at the same time, which occupy 50% of the memory resources of the pipeline resource group. By default, the **fair** scheduling policy is used in the **pipeline** resource group. Therefore, the query is executed in the sequence of "first come, first served". +- To make full use of instance resources, the total memory quota of all child resource groups can be greater than that of the parent resource group. For example, the sum of the memory quota of the **global** resource group (80%) and that of the **admin** resource group (100%) is 180%, which is greater than 100%. + +In the following example configuration, there are multiple resource groups, some of which are templates. HetuEngine administrators can use templates to dynamically build a resource group tree. For example, in the **pipeline\_**\ *${USER}* group, *${USER}* is the name of the user who submits a query. *${SOURCE}* is also supported, which will be the source where a query is submitted later. You can also use custom variables in **source** expressions and **user** regular expressions. + +The following is an example of a resource group selector: + +.. code-block:: + + "selectors": [{ + "user": "bob", + "group": "admin" + }, + { + "source": ".*pipeline.*", + "queryType": "DATA_DEFINITION", + "group": "global.data_definition" + }, + { + "source": ".*pipeline.*", + "group": "global.pipeline.pipeline_${USER}" + }, + { + "source": "jdbc#(?.*)", + "clientTags": ["hipri"], + "group": "global.adhoc.bi-${toolname}.${USER}" + }, + { + "group": "global.adhoc.other.${USER}" + }] + +There are four selectors that define which resource group to run the query: + +- The first selector matches queries from **bob** and places them in the **admin** group. +- The second selector matches all data definition language (DDL) queries from the source name that includes the **pipeline** and places them in the **global.data_definition** group. This helps reduce the queuing time of such queries. +- The third selector matches queries from sources that include the **pipeline** and places them in a single-user pipe group that is dynamically created under the **global.pipeline** group. +- The fourth selector matches queries from BI tools whose sources match the regular expression **jdbc#(?.*)**, and the tags provided by the client are the superset of **hi-pri**. These files are placed in the subgroups that are dynamically created under the **global.pipeline.tools** group. Dynamic subgroups are created based on the naming variable **toolname**, which is extracted from the **source** regular expression. Consider queries that use the query containing the source **jdbc#powerfulbi**, user **kayla**, and client tags, **hipri** and **fast**. This query will be routed to the **global.pipeline.bi-powerfulbi.kayla** resource group. +- The last selector is a default selector that puts all the unmatched queries into the resource group. + +These selectors work together to implement the following policies: + +- HetuEngine administrator **bob** can run 50 queries at the same time. The query will run according to the user-supplied priority. +- For the remaining users: + + - The total number of concurrent queries cannot exceed 100. + - You can use the source pipeline to run a maximum of five concurrent DDL queries. The query is performed in the FIFO sequence. + - Non-DDL queries are executed in the **global.pipeline** group. The total number of concurrent queries is 45, and each user can run 5 queries concurrently. The query is performed in the FIFO sequence. + - Each BI tool can run a maximum of 10 concurrent queries, and each user can run a maximum of three concurrent queries. If the total number of concurrent queries exceeds 10, the user who runs the least queries gets the next concurrency slot. This policy makes it fairer to compete for resources. + - All the remaining queries are placed in each of the user groups under **global.adhoc.other**. + +The description of the query match selector is as follows: + +- Each pair of braces represents a selector that matches the resource group. Five selectors are configured to match the five resource groups. + + .. code-block:: + + admin + global.data_definition + global.pipeline.pipeline_${USER} + global.adhoc.bi-${toolname}.${USER + global.adhoc.other.${USER} + +- Only when all the conditions of the selector are met, the task can be put into the current queue for execution. For example, if user **amy** submits a query request in JDBC mode and **clientTags** is not configured, the query request cannot be allocated to the resource group **global.adhoc.bi-${toolname}.${USER}**. + +- When a query can meet the conditions of two selectors at the same time, the first selector that meets the requirements is matched. For example, if the **bob** user submits a DATA_DEFINITION job whose source is **pipeline**, only the resource corresponding to the resource group **admin** is matched, not the resource corresponding to **global.data_definition**. + +- If none of the four selectors is matched, resources in the resource group **global.adhoc.other.${USER}** specified by the last selector are used. This resource group functions as a default resource group. If the default resource group is not set and does not meet the conditions of other resource group selectors, the resource group will be rejected. + + The following is a complete example: + + .. code-block:: + + { + "rootGroups": [{ + "name": "global", + "softMemoryLimit": "80%", + "hardConcurrencyLimit": 100, + "maxQueued": 1000, + "schedulingPolicy": "weighted", + "jmxExport": true, + "subGroups": [{ + "name": "data_definition", + "softMemoryLimit": "10%", + "hardConcurrencyLimit": 5, + "maxQueued": 100, + "schedulingWeight": 1 + }, + { + "name": "adhoc", + "softMemoryLimit": "10%", + "hardConcurrencyLimit": 50, + "maxQueued": 1, + "schedulingWeight": 10, + "subGroups": [{ + "name": "other", + "softMemoryLimit": "10%", + "hardConcurrencyLimit": 2, + "maxQueued": 1, + "schedulingWeight": 10, + "schedulingPolicy": "weighted_fair", + "subGroups": [{ + "name": "${USER}", + "softMemoryLimit": "10%", + "hardConcurrencyLimit": 1, + "maxQueued": 100 + }] + }, + { + "name": "bi-${toolname}", + "softMemoryLimit": "10%", + "hardConcurrencyLimit": 10, + "maxQueued": 100, + "schedulingWeight": 10, + "schedulingPolicy": "weighted_fair", + "subGroups": [{ + "name": "${USER}", + "softMemoryLimit": "10%", + "hardConcurrencyLimit": 3, + "maxQueued": 10 + }] + }] + }, + { + "name": "pipeline", + "softMemoryLimit": "80%", + "hardConcurrencyLimit": 45, + "maxQueued": 100, + "schedulingWeight": 1, + "jmxExport": true, + "subGroups": [{ + "name": "pipeline_${USER}", + "softMemoryLimit": "50%", + "hardConcurrencyLimit": 5, + "maxQueued": 100 + }] + }] + }, + { + "name": "admin", + "softMemoryLimit": "100%", + "hardConcurrencyLimit": 50, + "maxQueued": 100, + "schedulingPolicy": "query_priority", + "jmxExport": true + }], + "selectors": [{ + "user": "bob", + "group": "admin" + }, + { + "source": ".*pipeline.*", + "queryType": "DATA_DEFINITION", + "group": "global.data_definition" + }, + { + "source": ".*pipeline.*", + "group": "global.pipeline.pipeline_${USER}" + }, + { + "source": "jdbc#(?.*)", + "clientTags": ["hipri"], + "group": "global.adhoc.bi-${toolname}.${USER}" + }, + { + "group": "global.adhoc.other.${USER}" + }], + "cpuQuotaPeriod": "1h" + } diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/importing_and_exporting_compute_instance_configurations.rst b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/importing_and_exporting_compute_instance_configurations.rst new file mode 100644 index 0000000..011b62a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/importing_and_exporting_compute_instance_configurations.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_1733.html + +.. _mrs_01_1733: + +Importing and Exporting Compute Instance Configurations +======================================================= + +Scenarios +--------- + +On the HetuEngine web UI, you can import or export the instance configuration file and download the instance configuration template. + +Prerequisites +------------- + +You have created a user for accessing the HetuEngine web UI. For details, see :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a user who can access the HetuEngine web UI and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. +#. In the **Basic Information** area on the **Dashboard** tab page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. + + - Importing an instance configuration file: Click **Import** above the instance list, select an instance configuration file in JSON format from the local PC, and click **Open**. + + .. important:: + + The instance configuration file must be named **upLoadConfig.json** so that it can be imported. + + - Exporting an instance configuration file: Click **Export** above the instance list to export the current instance configuration file to the local PC. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/index.rst new file mode 100644 index 0000000..386d80f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/index.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_1729.html + +.. _mrs_01_1729: + +Managing Compute Instances +========================== + +- :ref:`Configuring Resource Groups ` +- :ref:`Adjusting the Number of Worker Nodes ` +- :ref:`Managing a HetuEngine Compute Instance ` +- :ref:`Importing and Exporting Compute Instance Configurations ` +- :ref:`Viewing the Instance Monitoring Page ` +- :ref:`Viewing Coordinator and Worker Logs ` +- :ref:`Using Resource Labels to Specify on Which Node Coordinators Should Run ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + configuring_resource_groups + adjusting_the_number_of_worker_nodes + managing_a_hetuengine_compute_instance + importing_and_exporting_compute_instance_configurations + viewing_the_instance_monitoring_page + viewing_coordinator_and_worker_logs + using_resource_labels_to_specify_on_which_node_coordinators_should_run diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/managing_a_hetuengine_compute_instance.rst b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/managing_a_hetuengine_compute_instance.rst new file mode 100644 index 0000000..4553501 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/managing_a_hetuengine_compute_instance.rst @@ -0,0 +1,67 @@ +:original_name: mrs_01_1736.html + +.. _mrs_01_1736: + +Managing a HetuEngine Compute Instance +====================================== + +Scenario +-------- + +On the HetuEngine web UI, you can start, stop, delete, and roll-restart a single compute instance or compute instances in batches. + +.. important:: + + - Restarting HetuEngine + + During the restart or rolling restart of HetuEngine, do not create, start, stop, or delete HetuEngine compute instances on HSConsole. + + - Restarting HetuEngine compute instances + + - During the restart or rolling restart of HetuEngine compute instances, do not perform any change operations on the data sources on the HetuEngine and HetuEngine web UI, including restarting HetuEngine and changing its configurations. + + - If a compute instance has only one coordinator or worker, do not roll-restart the instance. + + - If the number of workers is greater than 10, the rolling restart may take more than 200 minutes. During this period, do not perform other O&M operations. + + - During the rolling restart of compute instances, HetuEngine releases Yarn resources and applies for them again. Ensure that the CPU and memory of Yarn are sufficient for starting 20% workers and Yarn resources are not preempted by other jobs. Otherwise, the rolling restart will fail. + + Viewing Yarn resources: Log in to FusionInsight Manager and choose **Tenant Resources**. On the navigation pane on the left, choose **Tenant Resources Management** to view the available queue resources of Yarn in the **Resource Quota** area. + + Viewing the CPU and memory of a worker container: Log in to FusionInsight Manager as a user who can access the HetuEngine WebUI and choose **Cluster** > **Services** > **HetuEngine**. In the **Basic Information** area, click the link next to **HSConsole WebUI** to go to the HSConsole page. Click **Operation** in the row where the target instance is located and click **Configure**. + + - During the rolling restart, ensure that Application Manager of coordinators or workers in the Yarn queue runs stably. + + Troubleshooting + + - If Application Manager of coordinators or workers in the Yarn queues is restarted during the rolling restart, the compute instances may be abnormal. In this case, you need to stop the compute instances and then start the compute instance for recovery. + - Compute instances are in the subhealthy state if they fail to be roll-restarted, which may lead to inconsistent configuration or number of coordinators or workers. In this case, the subhealth state of the instances will not be automatically restored. You need to manually check the instance status or restore the instance to healthy by performing the rolling restart again or stopping the compute instances. + +Prerequisites +------------- + +You have created an HetuEngine administrator for accessing the HetuEngine web UI. For details, see :ref:`Creating a HetuEngine User `. + +.. note:: + + - Users in the **hetuadmin** user group are HetuEngine administrators. Administrators have the permission to start, stop, and delete instances, and common users have only the permission to query instances. + - To modify the configuration of the current compute instance, you need to delete the instance on the HSConsole page. + +Procedure +--------- + +#. Log in to FusionInsight Manager as an administrator who can access the HetuEngine web UI and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. In the **Operation** column of the instance, you can perform the following operations on a single job: + + - To start an instance, click **Start**. + - To stop an instance, click **Stop**. + - To delete an instance that is no longer used, click **Delete**. The configuration information of the instance is also deleted. + - To roll-restart an instance, click **Rolling Restart**. + +#. In the upper part of the instance list, you can perform the following operations on jobs: + + - To start instances in batches, select the target instances in the instance list and click **Start**. + - To stop instances in batches, select the target instances in the instance list and click **Stop**. + - To delete instances in batches, select the target instances in the instance list and click **Delete**. + - To roll-restart instances in batches, select the target instances in the instance list and click **Rolling Restart**. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/using_resource_labels_to_specify_on_which_node_coordinators_should_run.rst b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/using_resource_labels_to_specify_on_which_node_coordinators_should_run.rst new file mode 100644 index 0000000..68efb8e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/using_resource_labels_to_specify_on_which_node_coordinators_should_run.rst @@ -0,0 +1,64 @@ +:original_name: mrs_01_24260.html + +.. _mrs_01_24260: + +Using Resource Labels to Specify on Which Node Coordinators Should Run +====================================================================== + +By default, coordinator and worker nodes randomly start on Yarn NodeManager nodes, and you have to open all ports on all NodeManager nodes. Using resource labels of Yarn, HetuEngine allows you to specify NodeManager nodes to run coordinators. + +Prerequisites +------------- + +You have created a user for accessing the HetuEngine web UI. For details, see :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a user who can access the HetuEngine web UI. + +#. Set Yarn parameters to specify the scheduler to handle PlacementConstraints. + + a. Choose **Cluster** > **Services** > **Yarn**. Click the **Configurations** tab and then **All Configurations**. On the displayed page, search for **yarn.resourcemanager.placement-constraints.handler**, set **Value** to **scheduler**, and click **Save**. + b. Click the **Instance** tab, select the active and standby ResourceManager instances, click **More**, and select **Restart Instance** to restart the ResourceManager instances of Yarn. Then wait until they are restarted successfully. + +#. .. _mrs_01_24260__en-us_topic_0000001173630826_li163657291812: + + Configure resource labels. + + a. Choose **Tenant Resources** > **Resource Pool**. On the displayed page, click **Add Resource Pool**. + b. Select a cluster, and enter a resource pool name and a resource label name, for example, **pool1**. Select the desired hosts, click |image1| to add the selected hosts to the new resource pool, and click **OK**. + +#. Set HetuEngine parameters to enable the coordinator placement policy and enter the node resource label. + + a. Choose **Cluster** > **Service** > HetuEngine. Click the **Configurations** tab and then **All Configurations**. On the displayed page, set parameters and click **Save**. + + .. table:: **Table 1** Setting HetuEngine parameters + + +------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Setting | + +======================================================+=============================================================================================================================+ + | yarn.hetuserver.engine.coordinator.placement.enabled | true | + +------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + | yarn.hetuserver.engine.coordinator.placement.label | Node resource label created in :ref:`3 `, for example, **pool1** | + +------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + + b. Click **Dashboard**, click **More**, and select **Service Rolling Restart**. Wait until the HetuEngine service is restarted successfully. + +#. Restart the HetuEngine compute instance. + + a. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. + b. Locate the row that contains the target instance and click **Start** in the **Operation** column. + +#. Check the node on which the coordinator is running. + + a. Return to FusionInsight Manager. + + b. Choose **Cluster** > **Services** > **Yarn**. In the **Basic Information** area on the **Dashboard** page, click the link next to **ResourceManager WebUI**. + + c. In the navigation pane on the left, choose **Cluster** > **Nodes**. You can view that the coordinator has been started on the node in the resource pool created in :ref:`3 `. + + |image2| + +.. |image1| image:: /_static/images/en-us_image_0000001295899908.png +.. |image2| image:: /_static/images/en-us_image_0000001349139461.png diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/viewing_coordinator_and_worker_logs.rst b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/viewing_coordinator_and_worker_logs.rst new file mode 100644 index 0000000..16cadc7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/viewing_coordinator_and_worker_logs.rst @@ -0,0 +1,23 @@ +:original_name: mrs_01_1735.html + +.. _mrs_01_1735: + +Viewing Coordinator and Worker Logs +=================================== + +Scenario +-------- + +On the HetuEngine web UI, you can view Coordinator and Worker logs on the Yarn web UI. + +Prerequisites +------------- + +You have created a user for accessing the HetuEngine web UI. For details, see :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a user who can access the HetuEngine web UI and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Click **Coordinator** or **Worker** in the **LogUI** column of the row where the target instance is located. Coordinator and Worker logs are displayed on the Yarn web UI. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/viewing_the_instance_monitoring_page.rst b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/viewing_the_instance_monitoring_page.rst new file mode 100644 index 0000000..0de2239 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/managing_compute_instances/viewing_the_instance_monitoring_page.rst @@ -0,0 +1,193 @@ +:original_name: mrs_01_1734.html + +.. _mrs_01_1734: + +Viewing the Instance Monitoring Page +==================================== + +Scenarios +--------- + +On the HetuEngine web UI, you can view the detailed information about a specified service, including the execution status of each SQL statement. If the current cluster uses dual planes, a Windows host that can connect to the cluster service plane is required. + +.. note:: + + You cannot view the compute instance task monitoring page using Internet Explorer. + +Prerequisites +------------- + +You have created an administrator for accessing the HetuEngine web UI. For details, see :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Log in to FusionInsight Manager as an administrator who can access the HetuEngine web UI and choose **Cluster** > *Name of the desired cluster* > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. + +#. In the **Basic Information** area on the **Dashboard** tab page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. + +#. Locate the row of the target **Instance ID**, and click **LINK** in the **WebUI** column of the row. The compute instance task monitoring page is displayed on a new page. The **Query History** page is displayed for the first access. Click **Metrics** to view the compute instance task monitoring page. + + + .. figure:: /_static/images/en-us_image_0000001348740049.png + :alt: **Figure 1** Compute instance task monitoring page + + **Figure 1** Compute instance task monitoring page + + .. table:: **Table 1** Metric description + + +---------------------------+--------------------------------------------------------------------------------------------+ + | Metric | Description | + +===========================+============================================================================================+ + | Cluster CPU Usage | Indicates the CPU usage of the current instance. | + +---------------------------+--------------------------------------------------------------------------------------------+ + | Cluster Free Memory | Indicates the free memory of the current instance. | + +---------------------------+--------------------------------------------------------------------------------------------+ + | Average Cluster CPU Usage | Indicates the average CPU usage of the current instance. | + +---------------------------+--------------------------------------------------------------------------------------------+ + | Used Query Memory | Indicates the memory used by the current instance. | + +---------------------------+--------------------------------------------------------------------------------------------+ + | Running Queries | Indicates the number of tasks concurrently executed on the current instance. | + +---------------------------+--------------------------------------------------------------------------------------------+ + | Queued Queries | Indicates the number of tasks to be executed in the waiting queue on the current instance. | + +---------------------------+--------------------------------------------------------------------------------------------+ + | Blocked Queries | Indicates the number of blocked tasks on the current instance. | + +---------------------------+--------------------------------------------------------------------------------------------+ + | Active Workers | Indicates the number of valid Workers on the current instance. | + +---------------------------+--------------------------------------------------------------------------------------------+ + | Avg Running Tasks | Indicates the average number of running tasks in the current instance. | + +---------------------------+--------------------------------------------------------------------------------------------+ + | Avg CPU cycles per worker | Indicates the average CPU cycles of each Worker on the current instance. | + +---------------------------+--------------------------------------------------------------------------------------------+ + +#. Filter query tasks by **State** on the page. + + + .. figure:: /_static/images/en-us_image_0000001296060020.png + :alt: **Figure 2** Filtering tasks by **State** + + **Figure 2** Filtering tasks by **State** + + .. table:: **Table 2** State description + + +-----------------------+-------------------------------------------------------------+ + | State | Description | + +=======================+=============================================================+ + | Select All | Views tasks of all statuses. | + +-----------------------+-------------------------------------------------------------+ + | Queued | Views the tasks to be executed in the waiting queue. | + +-----------------------+-------------------------------------------------------------+ + | Waiting For Resources | Views the tasks that wait for resources. | + +-----------------------+-------------------------------------------------------------+ + | Dispatching | Views the tasks that are being dispatched. | + +-----------------------+-------------------------------------------------------------+ + | Planning | Views the tasks that are being planned. | + +-----------------------+-------------------------------------------------------------+ + | Starting | Views the tasks that start running. | + +-----------------------+-------------------------------------------------------------+ + | Running | Views running tasks. | + +-----------------------+-------------------------------------------------------------+ + | Finishing | Views the tasks that are being finished. | + +-----------------------+-------------------------------------------------------------+ + | Finished | Views the finished tasks. | + +-----------------------+-------------------------------------------------------------+ + | Failed | Views failed tasks, which can be filtered by failure cause. | + +-----------------------+-------------------------------------------------------------+ + +#. Click a task ID to view the basic information, resource usage, stages, and tasks. For a failed task, you can view related logs on the detail query page. + + + .. figure:: /_static/images/en-us_image_0000001348740045.png + :alt: **Figure 3** Viewing task details + + **Figure 3** Viewing task details + + + .. figure:: /_static/images/en-us_image_0000001349059865.png + :alt: **Figure 4** Task resource utilization summary + + **Figure 4** Task resource utilization summary + + + .. figure:: /_static/images/en-us_image_0000001349139733.png + :alt: **Figure 5** Stages + + **Figure 5** Stages + + .. table:: **Table 3** Stages monitoring information + + +---------------------+--------------------------------------------------------------------------------------+ + | Monitoring Item | Description | + +=====================+======================================================================================+ + | SCHEDULED TIME SKEW | Indicates the scheduled time of the concurrent tasks on a node in the current stage. | + +---------------------+--------------------------------------------------------------------------------------+ + | CPU TIME SKEW | Indicates whether concurrent tasks have computing skew in any stage phase. | + +---------------------+--------------------------------------------------------------------------------------+ + + + .. figure:: /_static/images/en-us_image_0000001349139729.png + :alt: **Figure 6** Tasks + + **Figure 6** Tasks + + .. table:: **Table 4** **Tasks** monitoring items + + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Monitoring Item | Description | + +=================+=========================================================================================================================================================================================================================================================================+ + | ID | Indicates the ID of the task that is concurrently executed in multiple phases. The format is *Stage ID*:*Task ID*. | + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Host | Indicates the Worker node where the current task is being executed. | + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | State | Indicates the task execution status, including **PLANNED**, **RUNNING**, **FINISHED**, **CANCELED**, **ABORTED**, and **FAILED**. | + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Rows | Indicates the total number of data records read by a task. The unit is thousand (k) or million (M). By analyzing the number of data records read by different tasks in the same stage, you can quickly determine whether data skew occurs in the current task. | + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Rows/s | Indicates the number of data records read by a task per second. By analyzing the number of data records read by different tasks in the same stage, you can quickly determine whether the network bandwidth of the node is different and whether the node NIC is faulty. | + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Bytes | Indicates the data volume read by a task. | + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Bytes/s | Indicates the data volume read by a task per second. | + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Elapsed | Indicates the task execution duration. | + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | CPU Time | Indicates the CPU time used by a task. | + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Buffered | Indicates the buffer size of a task. | + +-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Click the "Host" link to view the task resource usage of each node. + + + .. figure:: /_static/images/en-us_image_0000001349259313.png + :alt: **Figure 7** Resource usage of the Task node + + **Figure 7** Resource usage of the Task node + + .. table:: **Table 5** Monitoring metrics of node resources + + +-------------------------+------------------------------------------------------------+ + | Name | Description | + +=========================+============================================================+ + | Node ID | Indicates the host ID. | + +-------------------------+------------------------------------------------------------+ + | Heap Memory | Indicates the maximum heap memory size. | + +-------------------------+------------------------------------------------------------+ + | Processors | Indicates the number of processors. | + +-------------------------+------------------------------------------------------------+ + | Uptime | Indicates the running duration. | + +-------------------------+------------------------------------------------------------+ + | External Address | Indicates the external IP address. | + +-------------------------+------------------------------------------------------------+ + | Internal Address | Indicates the internal IP address. | + +-------------------------+------------------------------------------------------------+ + | Process CPU Utilization | Indicates the physical CPU utilization. | + +-------------------------+------------------------------------------------------------+ + | System CPU Utilization | Indicates the system CPU utilization. | + +-------------------------+------------------------------------------------------------+ + | Heap Utilization | Indicates the heap memory utilization. | + +-------------------------+------------------------------------------------------------+ + | Non-Heap Memory Used | Indicates the non-Heap memory size. | + +-------------------------+------------------------------------------------------------+ + | Memory Pools | Indicates the memory pool size of the current Worker node. | + +-------------------------+------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/managing_data_sources/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/managing_data_sources/index.rst new file mode 100644 index 0000000..9481d4c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/managing_data_sources/index.rst @@ -0,0 +1,14 @@ +:original_name: mrs_01_1720.html + +.. _mrs_01_1720: + +Managing Data Sources +===================== + +- :ref:`Managing an External Data Source ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + managing_an_external_data_source diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/managing_data_sources/managing_an_external_data_source.rst b/doc/component-operation-guide-lts/source/using_hetuengine/managing_data_sources/managing_an_external_data_source.rst new file mode 100644 index 0000000..7d22c9d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/managing_data_sources/managing_an_external_data_source.rst @@ -0,0 +1,37 @@ +:original_name: mrs_01_24061.html + +.. _mrs_01_24061: + +Managing an External Data Source +================================ + +Scenarios +--------- + +On the HetuEngine web UI, you can view, edit, and delete an added data source. + +Prerequisites +------------- + +You have created a HetuEngine administrator for accessing the HetuEngine web UI. For details, see :ref:`Creating a HetuEngine User `. + +Viewing Data Source Information +------------------------------- + +#. Log in to Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. The HetuEngine service page is displayed. +#. In the **Basic Information** area on the **Dashboard** tab page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Click **Data Source**. In the data source list, view the data source name, description, type, and creation time. + +Editing a Data Source +--------------------- + +#. Log in to Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. The HetuEngine service page is displayed. +#. In the **Basic Information** area on the **Dashboard** tab page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Click **Data Source**, locate the row that contains the data source to be modified, click **Edit** in the **Operation** column, modify related information by referring to :ref:`Configuring Data Sources `, and click **OK**. + +Deleting a Data Source +---------------------- + +#. Log in to Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. The HetuEngine service page is displayed. +#. In the **Basic Information** area on the **Dashboard** tab page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Click **Data Source**, locate the row that contains the data source to be deleted, and click **Delete** in the **Operation** column. In the displayed dialog box, select **Deleted data sources cannot be restored. Exercise caution when performing this operation** and click **OK**. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/index.rst new file mode 100644 index 0000000..2e848aa --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/index.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_2336.html + +.. _mrs_01_2336: + +Using a Third-Party Visualization Tool to Access HetuEngine +=========================================================== + +- :ref:`Usage Instruction ` +- :ref:`Using DBeaver to Access HetuEngine ` +- :ref:`Using Tableau to Access HetuEngine ` +- :ref:`Using PowerBI to Access HetuEngine ` +- :ref:`Using Yonghong BI to Access HetuEngine ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + usage_instruction + using_dbeaver_to_access_hetuengine + using_tableau_to_access_hetuengine + using_powerbi_to_access_hetuengine + using_yonghong_bi_to_access_hetuengine diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/usage_instruction.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/usage_instruction.rst new file mode 100644 index 0000000..e0bc8b1 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/usage_instruction.rst @@ -0,0 +1,8 @@ +:original_name: mrs_01_24178.html + +.. _mrs_01_24178: + +Usage Instruction +================= + +To access the dual-plane environment, the cluster service plane must be able to communicate with the local Windows environment. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_dbeaver_to_access_hetuengine.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_dbeaver_to_access_hetuengine.rst new file mode 100644 index 0000000..8047d6a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_dbeaver_to_access_hetuengine.rst @@ -0,0 +1,241 @@ +:original_name: mrs_01_2337.html + +.. _mrs_01_2337: + +Using DBeaver to Access HetuEngine +================================== + +This section uses DBeaver 6.3.5 as an example to describe how to perform operations on HetuEngine. + +Prerequisites +------------- + +- The DBeaver has been installed properly. Download the DBeaver software from https://dbeaver.io/files/6.3.5/. + + .. note:: + + Currently, DBeaver 5.\ *x* and 6.\ *x* are supported. + +- A human-machine user has been created in the cluster. For details about how to create a user, see :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +Method 1: Using ZooKeeper to access HetuEngine + +#. .. _mrs_01_2337__en-us_topic_0000001219029577_li1747527125: + + Download the HetuEngine client. + + a. Log in to FusionInsight Manager. + + b. Choose **Cluster** > **Services** > **HetuEngine** > **Dashboard**. + + c. In the upper right corner of the page, choose **More** > **Download Client** and download the **Complete Client** to the local PC as prompted. + + d. .. _mrs_01_2337__en-us_topic_0000001219029577_li1727232161619: + + Decompress the HetuEngine client package **FusionInsight_Cluster\_**\ *Cluster ID*\ **\_ HetuEngine\_Client.tar** to obtain the JDBC file and save it to a local directory, for example, **D:\\test**. + + .. note:: + + Obtaining the JDBC file: + + Obtain the **hetu-jdbc-*.jar** file from the **FusionInsight_Cluster\_**\ *Cluster ID*\ **\_HetuEngine\_ClientConfig\\HetuEngine\\xxx\\** directory. + + Note: **xxx** can be **arm** or **x86**. + +#. Download the Kerberos authentication file of the HetuEngine user. + + a. Log in to FusionInsight Manager. + b. Choose **System** > **Permission** > **User**. + c. Locate the row that contains the target HetuEngine user, click **More** in the **Operation** column, and select **Download Authentication Credential**. + d. Decompress the downloaded package to obtain the **user.keytab** and **krb5.conf** files. + +#. Log in to the node where the HSBroker role is deployed in the cluster as user **omm**, go to the **${BIGDATA_HOME}/FusionInsight_Hetu\_8.1.2.2/xxx\ \_HSBroker/etc/** directory, and download the **jaas-zk.conf** and **hetuserver.jks** files to the local PC. + + .. note:: + + The version 8.1.2.2 is used as an example. Replace it with the actual version number. + + Modify the **jaas-zk.conf** file as follows. **keyTab** is the keytab file path of the user who accesses HetuEngine, and **principal** is *Username for accessing HetuEngine*\ **@Domain name in uppercase.COM**. + + .. code-block:: + + Client { + com.sun.security.auth.module.Krb5LoginModule required + useKeyTab=true + keyTab="D:\\tmp\\user.keytab" + principal="admintest@HADOOP.COM" + useTicketCache=false + storeKey=true + debug=true; + }; + +#. Add the host mapping to the local **hosts** file. The content format is as follows: + + *Host IP address Host name* + + Example: 192.168.23.221 192-168-23-221 + + .. note:: + + The local **hosts** file in a Windows environment is stored in, for example, **C:\\Windows\\System32\\drivers\\etc**. + +#. Configure the DBeaver startup file **dbeaver.ini**. + + a. Add the Java path to the file. + + .. code-block:: + + -VM + C:\Program Files\Java\jdk1.8.0_131\bin + + b. Set the ZooKeeper and Kerberos parameters by referring to the following parameters. Replace the file paths with the actual paths. + + .. code-block:: + + -Dsun.security.krb5.debug=true + -Djava.security.auth.login.config=D:\tmp\jaas-zk.conf + -Dzookeeper.sasl.clientconfig=Client + -Dzookeeper.auth.type=kerberos + -Djava.security.krb5.conf=D:\tmp\krb5.conf + + .. note:: + + - The Greenwich Mean Time (GMT) is not supported. If the current time zone is GMT+, add **-Duser.timezone=UTC** to the **dbeaver.ini** file to change the time zone to UTC. + - If DBeaver is started, restart the DBeaver software for the new configuration items in the **dbeaver.ini** file to take effect. + +#. Start the DBeaver, right-click **Database Navigator**, and click **Create New Connection**. + +#. Search for **Presto** in the search box and double-click the Presto icon. + +#. Click **Edit Driver Settings**. + +#. Set **Class Name** to **io.prestosql.jdbc.PrestoDriver**. + +#. Enter the URL of HetuEngine in the **URL Template** text box. + + URL format: jdbc:presto://*IP address of node 1 where the ZooKeeper service resides*:2181,\ *IP address of node 2 where the ZooKeeper service resides*:2181,\ *IP address of node 3 where the ZooKeeper service resides*:2181/hive/default?serviceDiscoveryMode=zooKeeper&zooKeeperNamespace=hsbroker&zooKeeperServerPrincipal=zookeeper/hadoop.hadoop.com + + Example: **jdbc:presto://192.168.8.37:**\ 2181\ **,192.168.8.38:**\ 2181\ **,192.168.8.39:**\ 2181\ **/hive/default?serviceDiscoveryMode=zooKeeper&zooKeeperNamespace=hsbroker&zooKeeperServerPrincipal=zookeeper/hadoop.hadoop.com** + +#. Click **Add File** and select the obtained JDBC file obtained in :ref:`1.d `. + +#. Click **Connection properties**. On the **Connection properties** tab page, right-click and select **Add new property**. Set parameters by referring to :ref:`Table 1 `. + + .. _mrs_01_2337__en-us_topic_0000001219029577_table1173517153344: + + .. table:: **Table 1** Property information + + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Example Value | + +===================================+===========================================================================================================================================+ + | KerberosPrincipal | zhangsan | + | | | + | | .. note:: | + | | | + | | Human-machine user created in the cluster. For details, see :ref:`Creating a HetuEngine User `. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + | KerberosKeytabPath | D:\\\\user.keytab | + | | | + | | .. note:: | + | | | + | | You need to configure this parameter when using the keytab mode for access. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + | KerberosRemoteServiceName | HTTP | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + | SSL | true | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + | deploymentMode | on_yarn | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + | tenant | default | + | | | + | | .. note:: | + | | | + | | The tenant to which the user belongs needs to be configured in the cluster. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + | user | zhangsan | + | | | + | | .. note:: | + | | | + | | Human-machine user created in the cluster. For details, see :ref:`Creating a HetuEngine User `. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + | password | zhangsan@##65331853 | + | | | + | | .. note:: | + | | | + | | - Password set when a human-machine user is created in the cluster. For details, see :ref:`Creating a HetuEngine User `. | + | | - You need to configure this parameter when using username and password for access. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + | KerberosConfigPath | D:\\\\krb5.conf | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + | SSLTrustStorePath | D:\\\\hetuserver.jks | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ + + :ref:`Figure 1 ` shows an example of the parameter settings. + + .. _mrs_01_2337__en-us_topic_0000001219029577_fig16912205184112: + + .. figure:: /_static/images/en-us_image_0000001438431645.png + :alt: **Figure 1** Example of parameter settings + + **Figure 1** Example of parameter settings + +#. Click **OK**. + +#. Click **Finish**. The HetuEngine is successfully connected. + + .. note:: + + If a message is displayed indicating that you do not have the permission to view the table, configure the permission by referring to :ref:`Configuring Permissions for Tables, Columns, and Databases `. + +Method 2: Using HSBroker to access HetuEngine + +#. .. _mrs_01_2337__en-us_topic_0000001219029577_li29221671357: + + Obtain the JDBC JAR file by referring to :ref:`1 `. + +#. Open DBeaver, choose **Database** > **New Database Connection**, search for PrestoSQL, and open it. + +#. Click **Edit Driver Settings** and set parameters by referring to the following table. + + .. table:: **Table 2** Driver settings + + +-----------------------+--------------------------------+----------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Value | Remarks | + +=======================+================================+============================================================================================================================+ + | Class Name | io.prestosql.jdbc.PrestoDriver | / | + +-----------------------+--------------------------------+----------------------------------------------------------------------------------------------------------------------------+ + | URL Template | URL of HetuEngine | URL format: | + | | | | + | | | jdbc:presto://<*HSBrokerIP1:port1*>,<*HSBrokerIP2:port2*>,<*HSBrokerIP3:port3*>/hive/default?serviceDiscoveryMode=hsbroker | + +-----------------------+--------------------------------+----------------------------------------------------------------------------------------------------------------------------+ + +#. Click **Add File** and upload the JDBC driver package obtained in :ref:`1 `. + +#. Click **Find Class**. The driver class is automatically obtained. Click **OK** to complete the driver setting, as shown in :ref:`Figure 2 `. + + .. _mrs_01_2337__en-us_topic_0000001219029577_fig7280201602711: + + .. figure:: /_static/images/en-us_image_0000001441091233.png + :alt: **Figure 2** Driver settings + + **Figure 2** Driver settings + +#. On the **Main** tab page for creating a connection, enter the user name and password, and click **Test Connection**. After the connection is successful, click **OK**, and then click **Finish**. + + + .. figure:: /_static/images/en-us_image_0000001349259429.png + :alt: **Figure 3** Creating a connection + + **Figure 3** Creating a connection + +#. After the connection is successful, the page shown in :ref:`Figure 4 ` is displayed. + + .. _mrs_01_2337__en-us_topic_0000001219029577_fig18372036443: + + .. figure:: /_static/images/en-us_image_0000001441208981.png + :alt: **Figure 4** Successful connection + + **Figure 4** Successful connection diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_powerbi_to_access_hetuengine.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_powerbi_to_access_hetuengine.rst new file mode 100644 index 0000000..38b80ef --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_powerbi_to_access_hetuengine.rst @@ -0,0 +1,136 @@ +:original_name: mrs_01_24012.html + +.. _mrs_01_24012: + +Using PowerBI to Access HetuEngine +================================== + +Prerequisites +------------- + +- PowerBI has been installed. +- The JDBC JAR file has been obtained. For details, see :ref:`1 `. +- A human-machine user has been created in the cluster. For details about how to create a user, see :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Use the default configuration to install **hetu-odbc-win64.msi**. Download link: `https://openlookeng.io/download.html `__. + + + .. figure:: /_static/images/en-us_image_0000001349259001.png + :alt: **Figure 1** Downloading the driver + + **Figure 1** Downloading the driver + +#. Configure data source driver. + + a. Run the following commands in the local command prompt to stop the ODBC service that is automatically started. + + **cd C:\\Program Files\\openLooKeng\\openLooKeng ODBC Driver 64-bit\\odbc_gateway\\mycat\\bin** + + **mycat.bat stop** + + |image1| + + b. Replace the JDBC driver. + + Copy the JDBC JAR file obtained in :ref:`1 ` to the **C:\\Program Files\\openLooKeng\\openLooKeng ODBC Driver 64-bit\\odbc_gateway\\mycat\\lib** directory and delete the original **hetu-jdbc-1.0.1.jar** file from the directory. + + c. Edit the protocol prefix of the ODBC **server.xml** file. + + Change the property value of **server.xml** in the **C:\\Program Files\\openLooKeng\\openLooKeng ODBC Driver 64-bit\\odbc_gateway\\mycat\\conf** directory from **jdbc:lk://** to + + **jdbc:presto://**. + + d. .. _mrs_01_24012__en-us_topic_0000001173470764_li13423101229: + + Configure the connection mode of using the user name and password. + + Create a **jdbc_param.properties** file in a user-defined path, for example, **C:\\hetu**, and add the following content to the file: + + .. code-block:: + + user=admintest + password=admintest@##65331853 + + .. note:: + + **user**: indicates the username of the created human-machine user, for example, **admintest**. + + **password**: indicates the password of the created human-machine user, for example, **admintest@##65331853**. + + e. Run the following commands to restart the ODBC service: + + **cd C:\\Program Files\\openLooKeng\\openLooKeng ODBC Driver 64-bit\\odbc_gateway\\mycat\\bin** + + **mycat.bat restart** + + .. note:: + + The ODBC service must be stopped each time the configuration is modified. After the modification is complete, restart the ODBC service. + +#. On the Windows **Control Panel**, enter **odbc** to search for the ODBC management program. + + |image2| + +#. Choose **Add** > **openLookeng ODBC 1.1 Driver** > **Finish**. + + |image3| + +#. Enter the name and description as shown in the following figure and click **Next**. + + |image4| + +#. Configure parameters by referring to the following figure. Obtain **,,/hive/default?serviceDiscoveryMode=hsbroker** for **Connect URL** by referring to :ref:`2 `. Select the **jdbc_param.properties** file prepared in :ref:`2.d ` for **Connect Config**. Set **User name** to the user name that is used to download the credential. + + |image5| + +#. Click **Test DSN** to test the connection. If the connection is successful and both **Catalog** and **Schema** contain content, the connection is successful. Click **Next**. + + |image6| + + |image7| + +#. Click **Finish**. + + |image8| + +#. To use PowerBI for interconnection, choose **Get data** > **All** > **ODBC** > **Connect**. + + |image9| + +#. Select the data source to be added and click **OK**. + + + .. figure:: /_static/images/en-us_image_0000001349259005.png + :alt: **Figure 2** Adding a data source + + **Figure 2** Adding a data source + +#. (Optional) Enter **User name** and **Password** of the user who downloads the credential, and click **Connect**. + + + .. figure:: /_static/images/en-us_image_0000001296219336.png + :alt: **Figure 3** Entering the database username and password + + **Figure 3** Entering the database username and password + +#. After the connection is successful, all table information is displayed, as shown in :ref:`Figure 4 `. + + .. _mrs_01_24012__en-us_topic_0000001173470764_fig5250802327: + + .. figure:: /_static/images/en-us_image_0000001349059549.png + :alt: **Figure 4** Successful connection + + **Figure 4** Successful connection + +.. |image1| image:: /_static/images/en-us_image_0000001295899864.png +.. |image2| image:: /_static/images/en-us_image_0000001348739725.png +.. |image3| image:: /_static/images/en-us_image_0000001349139417.png +.. |image4| image:: /_static/images/en-us_image_0000001296059704.png +.. |image5| image:: /_static/images/en-us_image_0000001295739900.png +.. |image6| image:: /_static/images/en-us_image_0000001296219332.png +.. |image7| image:: /_static/images/en-us_image_0000001295899860.png +.. |image8| image:: /_static/images/en-us_image_0000001349139413.png +.. |image9| image:: /_static/images/en-us_image_0000001295739896.png diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_tableau_to_access_hetuengine.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_tableau_to_access_hetuengine.rst new file mode 100644 index 0000000..61f89b3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_tableau_to_access_hetuengine.rst @@ -0,0 +1,47 @@ +:original_name: mrs_01_24010.html + +.. _mrs_01_24010: + +Using Tableau to Access HetuEngine +================================== + +Prerequisites +------------- + +- Tableau has been installed. +- The JDBC JAR file has been obtained. For details, see :ref:`1 `. +- A human-machine user has been created in the cluster. For details about how to create a user, see :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Place the obtained JAR file to the Tableau installation directory, for example, **C:\\Program Files\\Tableau\\Drivers**. + +#. .. _mrs_01_24010__en-us_topic_0000001173789310_li6197135010379: + + Open Tableau, select **Other databases (JDBC)**, enter the URL and the username and password of the created human-machine user, and click **Sign In**. + + |image1| + + URL format: + + jdbc:presto://<*HSBrokerIP1:port1*>,<*HSBrokerIP2:port2*>,<*HSBrokerIP3:port3*>/hive/default?serviceDiscoveryMode=hsbroker + + Example: + + jdbc:presto://192.168.8.37:29860,192.168.8.38:29860,192.168.8.39:29860/hive/default?serviceDiscoveryMode=hsbroker + + Obtain the HSBroker node and port number: + + a. Log in to FusionInsight Manager. + b. Choose **Cluster** > **Services** > **HetuEngine** > **Role** > **HSBroker** to obtain the service IP addresses of all HSBroker instances. + c. Choose **Cluster** > **Services** > **HetuEngine** > **Configurations** > **All Configurations** and search for **server.port** on the right to obtain the port number of HSBroker. + + .. note:: + + - You can select one or more normal brokers for the HSBroker node and port number. + - If the connection fails, disable the proxy and try again. + +#. After the login is successful, drag the data table to be operated to the operation window on the right and refresh data. + +.. |image1| image:: /_static/images/en-us_image_0000001348740145.png diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_yonghong_bi_to_access_hetuengine.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_yonghong_bi_to_access_hetuengine.rst new file mode 100644 index 0000000..d3cb21b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_yonghong_bi_to_access_hetuengine.rst @@ -0,0 +1,62 @@ +:original_name: mrs_01_24013.html + +.. _mrs_01_24013: + +Using Yonghong BI to Access HetuEngine +====================================== + +Prerequisites +------------- + +- Yonghong BI has been installed. +- The JDBC JAR file has been obtained. For details, see :ref:`1 `. +- A human-machine user has been created in the cluster. For details about how to create a user, see :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Open Yonghong Desktop and choose **Create Connection** > **presto**. + + |image1| + +#. On the data source configuration page, set parameters by referring to :ref:`Figure 1 `. **User** and **Password** are the username and password of the created human-machine user. After the configuration is complete, click **Test Connection**. + + .. _mrs_01_24013__en-us_topic_0000001219231283_fig171919211236: + + .. figure:: /_static/images/en-us_image_0000001295740208.png + :alt: **Figure 1** Configuring the data source + + **Figure 1** Configuring the data source + + - **Driver**: Choose **Custom** > **Select Custom Driver**. Click |image2|, edit the driver name, click **Upload File** to upload the obtained JDBC JAR file, and click **OK**. + + |image3| + + - **URL**: For details, see "URL format" in :ref:`2 `. + + - **Server Login**: Select **Username and Password** and enter the username and password. + +#. .. _mrs_01_24013__en-us_topic_0000001219231283_li55331456256: + + Click **New Data Set**. On the displayed page, change the save path by referring to :ref:`Figure 2 ` and click **OK**. + + .. _mrs_01_24013__en-us_topic_0000001219231283_fig10932125853718: + + .. figure:: /_static/images/en-us_image_0000001295900172.png + :alt: **Figure 2** Changing a path + + **Figure 2** Changing a path + +#. In the connection area, select **hetu** > **hive** > **default** > **Views**. In the **New Data Set** area on the right, select **SQL Data Set**. + + |image4| + +#. In the **Connection** area, select the new data set created in :ref:`3 `. All table information is displayed. Select a table, for example, **test**, and click **Refresh Data**. All table information is displayed in the **Data Details** area on the right. + + |image5| + +.. |image1| image:: /_static/images/en-us_image_0000001440850393.png +.. |image2| image:: /_static/images/en-us_image_0000001349139725.jpg +.. |image3| image:: /_static/images/en-us_image_0000001440970317.png +.. |image4| image:: /_static/images/en-us_image_0000001349259309.png +.. |image5| image:: /_static/images/en-us_image_0000001349059857.png diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/hetuengine_cross-domain_function_usage.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/hetuengine_cross-domain_function_usage.rst new file mode 100644 index 0000000..0b9e874 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/hetuengine_cross-domain_function_usage.rst @@ -0,0 +1,42 @@ +:original_name: mrs_01_2335.html + +.. _mrs_01_2335: + +HetuEngine Cross-Domain Function Usage +====================================== + +#. Open the data source in the local domain. You can create a virtual schema to shield the real schema information and instance information of the physical data source in the local domain from remote access requests. The remote end can use the virtual schema name to access the data source in the local domain. + + .. code-block:: + + CREATE VIRTUAL SCHEMA hive01.vschema01 WITH ( + catalog = 'hive01', + schema = 'ins1' + ); + +#. Register the data source of the HetuEngine type on the remote HetuEngine and add the local domain HetuEngine by referring to :ref:`Configuring a HetuEngine Data Source `. + +#. Use cross-domain collaborative analysis. + + .. code-block:: + + // 1. Open the hive1.ins2 data source on the remote HetuEngine. + CREATE VIRTUAL SCHEMA hive1.vins2 WITH ( + catalog = 'hive1', + schema = 'ins2' + ); + + // 2. Register three types of data sources, including Hive, GaussDB A, and HetuEngine, on HetuEngine in the local domain. + hetuengine> show catalogs; + Catalog + ---------- + dws + hetuengine_dc + hive + hive_dg + system + systemremote + (6 rows) + + // 3. Perform cross-source collaborative analysis on HetuEngine in the local domain. + select * from hive_dg.schema1.table1 t1 join hetuengine_dc.vins2.table3 t2 join dws.schema02.table4 t3 on t1.name = t2.item and t2.id = t3.cardNo; diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/hetuengine_cross-domain_rate_limit_function.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/hetuengine_cross-domain_rate_limit_function.rst new file mode 100644 index 0000000..c2426a5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/hetuengine_cross-domain_rate_limit_function.rst @@ -0,0 +1,70 @@ +:original_name: mrs_01_24284.html + +.. _mrs_01_24284: + +HetuEngine Cross-Domain Rate Limit Function +=========================================== + +#. .. _mrs_01_24284__en-us_topic_0000001219231227_li12347313587: + + Configure the **ratelimit.xml** configuration file for the HSFabric rate limit policy on the local host. The configuration template is as follows: + + .. code-block:: + + + + 0755 + 5120 + + + 0769 + 102400 + + + hsfabric01 + + 10.10.10.10:29900 + + + + + + 0770 + 20480 + + + hsfabric02 + + 10.10.10.10:29900 + + + + + + + + .. table:: **Table 1** Parameters in the **ratelimit.xml** file + + +-------------------+-------------------------------------------------------------------------------------------------------+-------------------+ + | Parameter | Description | Example Value | + +===================+=======================================================================================================+===================+ + | current-domain | Name of zone to which the local domain data source belongs | 0755 | + +-------------------+-------------------------------------------------------------------------------------------------------+-------------------+ + | default-bandwidth | Default data volume limit, in KB | 5120 | + +-------------------+-------------------------------------------------------------------------------------------------------+-------------------+ + | domainName | Name of zone to which the target domain data source belongs | 0769 | + +-------------------+-------------------------------------------------------------------------------------------------------+-------------------+ + | bandwidth | Data volume from the current zone (for example, city1) to the target zone (for example, city2), in KB | 102400 | + +-------------------+-------------------------------------------------------------------------------------------------------+-------------------+ + | clusterName | Description of the cluster in the target domain | hsfabric01 | + +-------------------+-------------------------------------------------------------------------------------------------------+-------------------+ + | hsfabricHostPort | Service IP address and port number of HSFabric in the target domain | 10.10.10.10:29900 | + +-------------------+-------------------------------------------------------------------------------------------------------+-------------------+ + +#. Log in to Manager where the target domain cluster locates as an administrator who can access the HetuEngine WebUI. + +#. Choose **Cluster** > **Services** > **HetuEngine**. On the displayed page, click **Configurations** and then **All Configurations**. + +#. Choose **HSFabric(role)** > **Rate Limit** and click **Upload File** to upload the **ratelimit.xml** file prepared in :ref:`1 `. + +#. Click **Instance**, select the **HSFabric** instance, choose **More** > **Restart Instance**, and enter the password to restart the HSFabric instance. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/index.rst new file mode 100644 index 0000000..ebd3047 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_2342.html + +.. _mrs_01_2342: + +Using HetuEngine Cross-Domain Function +====================================== + +- :ref:`Introduction to HetuEngine Cross-Source Function ` +- :ref:`HetuEngine Cross-Domain Function Usage ` +- :ref:`HetuEngine Cross-Domain Rate Limit Function ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + introduction_to_hetuengine_cross-source_function + hetuengine_cross-domain_function_usage + hetuengine_cross-domain_rate_limit_function diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/introduction_to_hetuengine_cross-source_function.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/introduction_to_hetuengine_cross-source_function.rst new file mode 100644 index 0000000..960e3e7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/introduction_to_hetuengine_cross-source_function.rst @@ -0,0 +1,24 @@ +:original_name: mrs_01_2334.html + +.. _mrs_01_2334: + +Introduction to HetuEngine Cross-Source Function +================================================ + +HetuEngine provide unified standard SQL to implement efficient access to multiple data sources distributed in multiple regions (or data centers), shields data differences in the structure, storage, and region, and decouples data and applications. + + +.. figure:: /_static/images/en-us_image_0000001349139853.png + :alt: **Figure 1** HetuEngine cross-region functions + + **Figure 1** HetuEngine cross-region functions + +Key Technologies and Advantages of HetuEngine Cross-Domain Function +------------------------------------------------------------------- + +- No single-point failure bottleneck: HSFabric supports horizontal scale-out and multi-channel parallel transmission, maximizing the transmission rate. Cross-domain latency is no longer a bottleneck. +- Better computing resource utilization: Data compression and serialization tasks are delivered to Worker for parallel computing. +- Efficient serialization: The data serialization format is optimized to reduce the amount of data to be transmitted at the same data volume level. +- Streaming transmission: Based on HTTP 2.0 stream, ensure the universality of the HTTP protocol and reduce the repeated invoking of RPC during the transmission of a large amount of data. +- Resumable transmission: A large amount of data is prevented from being retransmitted after the connection is interrupted abnormally during data transmission. +- Traffic control: The network bandwidth occupied by data transmission can be limited by region to prevent other services from being affected due to exclusive traffic occupation in cross-region limited bandwidth scenarios. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_from_scratch.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_from_scratch.rst new file mode 100644 index 0000000..70ce020 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_from_scratch.rst @@ -0,0 +1,78 @@ +:original_name: mrs_01_1711.html + +.. _mrs_01_1711: + +Using HetuEngine from Scratch +============================= + +This section describes how to use HetuEngine to connect to the Hive data source and query database tables of the Hive data source of the cluster through HetuEngine. + +Prerequisites +------------- + +- The HetuEngine and Hive services have been installed in the cluster and are running properly. +- If Kerberos authentication has been enabled for the cluster, you need to create a HetuEngine user and grant related permissions to the user in advance. For details, see :ref:`Creating a HetuEngine User `. In addition, you need to configure the permissions to manage the databases, tables, and columns of the data source for the user using Ranger. For details, see :ref:`Adding a Ranger Access Permission Policy for HetuEngine `. +- The cluster client has been installed, for example, in the **/opt/client** directory. + +Procedure +--------- + +#. Create and start a HetuEngine compute instance. + + a. Log in to FusionInsight Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. + b. In the **Basic Information** area on the **Dashboard** tab page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. + c. Click **Create Configuration** above the instance list. In the **Configure Instance** dialog box, configure parameters. + + #. In the **Basic Configuration** area, set **Resource Queue** to the tenant queue associated with the user. + #. Configure parameters in the **Coordinator Container Resource Configuration**, **Worker Container Resource Configuration**, and **Advanced Configuration** areas based on the actual resource plan. For details about the parameter configuration, see :ref:`Creating HetuEngine Compute Instances ` or retain the default values. + #. Select **Start Now** and wait until the instance configuration is complete. + +#. Log in to the node where the HetuEngine client is installed and run the following command to switch to the client installation directory: + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If the cluster is in security mode, run the following command to perform security authentication. If the cluster is in normal mode, skip this step. + + **kinit** *HetuEngine operation user* + + Example: + + **kinit** **hetu_test** + + Enter the password as prompted and change the password upon your first login. + +#. Run the following command to log in to the catalog of the data source: + + **hetu-cli --catalog** *Data source name* + + For example, run the following command: + + **hetu-cli --catalog** **hive** + + .. note:: + + The default name of the Hive data source of the cluster is **hive**. If you need to connect to an external data source, configure the external data source on HSConsole by referring to :ref:`Configuring Data Sources `. + + .. code-block:: + + java -Djava.security.auth.login.config=/opt/client/HetuEngine/hetuserver/conf/jaas.conf -Dzookeeper.sasl.clientconfig=Client -Dzookeeper.auth.type=kerberos -Djava.security.krb5.conf=/opt/client/KrbClient/kerberos/var/krb5kdc/krb5.conf -Djava.util.logging.config.file=/opt/client/HetuEngine/hetuserver/conf/hetuserver-client-logging.properties -jar /opt/client/HetuEngine/hetuserver/jars/hetu-cli-*-executable.jar --catalog hive --deployment-mode on_yarn --server https://10.112.17.189:24002,10.112.17.228:24002,10.112.17.150:24002?serviceDiscoveryMode=zooKeeper&zooKeeperNamespace=hsbroker --krb5-remote-service-name HTTP --krb5-config-path /opt/client/KrbClient/kerberos/var/krb5kdc/krb5.conf + hetuengine> + +#. Run the following command to view the database information: + + **show schemas;** + + .. code-block:: + + Schema + -------------------- + default + information_schema + (2 rows) + Query 20200730_080535_00002_ct2eg, FINISHED, 3 nodes + Splits: 36 total, 36 done (100.00%) + 0:02 [2 rows, 35B] [0 rows/s, 15B/s] diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_client.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_client.rst new file mode 100644 index 0000000..aaa6717 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_client.rst @@ -0,0 +1,59 @@ +:original_name: mrs_01_1737.html + +.. _mrs_01_1737: + +Using the HetuEngine Client +=========================== + +Scenario +-------- + +If a compute instance is not created or started, you can log in to the HetuEngine client to create or start the compute instance. This section describes how to manage a compute instance on the client in the O&M or service scenario. + +Prerequisites +------------- + +- The cluster client has been installed. For example, the installation directory is **/opt/client**. + +- You have created a common HetuEngine user, for example, **hetu_test** who has the permissions of the Hive (with Ranger disabled), **hetuuser**, and **default** queues. + + For details about how to create a user, see :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Log in to the node where the HetuEngine client resides as the user who installs the client, and switch to the client installation directory. + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. Log in to the HetuEngine client based on the cluster authentication mode. + + - In security mode, run the following command to complete user authentication and log in to the HetuEngine client: + + **kinit hetu_test** + + **hetu-cli --catalog hive --tenant default --schema default** + + - In normal mode, run the following command to log in to the HetuEngine client: + + **hetu-cli --catalog hive --tenant default --schema default --user hetu_test** + + .. note:: + + **hetu_test** is a service user who has at least the tenant role specified by **--tenant** and cannot be an OS user. + + Parameter description: + + - **--catalog:** (Optional) Specifies the name of the specified data source. + - **--tenant**: (Optional) Specifies the tenant resource queue started by the cluster. Do not specify it as the default queue of a tenant. When this parameter is used, the service user must have the role permission of the tenant. + - **--schema**: (Optional) Specifies the name of the schema of the data source to be accessed. + - **--user**: (Mandatory in normal mode) Specifies the name of the user who logs in to the client to execute services. The user must have at least the role of the queue specified by **--tenant**. + + .. note:: + + - When you log in to the client for the first time, you need to start the HetuEngine cluster on the server. The client page is displayed 40 seconds later. + - The client supports SQL syntax and is compatible with the SQL syntax of the open-source openLooKeng 1.2.0. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_cross-source_function/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_cross-source_function/index.rst new file mode 100644 index 0000000..d802b33 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_cross-source_function/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_1738.html + +.. _mrs_01_1738: + +Using the HetuEngine Cross-Source Function +========================================== + +- :ref:`Introduction to HetuEngine Cross-Source Function ` +- :ref:`Usage Guide of HetuEngine Cross-Source Function ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + introduction_to_hetuengine_cross-source_function + usage_guide_of_hetuengine_cross-source_function diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_cross-source_function/introduction_to_hetuengine_cross-source_function.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_cross-source_function/introduction_to_hetuengine_cross-source_function.rst new file mode 100644 index 0000000..c9f16b2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_cross-source_function/introduction_to_hetuengine_cross-source_function.rst @@ -0,0 +1,45 @@ +:original_name: mrs_01_1739.html + +.. _mrs_01_1739: + +Introduction to HetuEngine Cross-Source Function +================================================ + +Enterprises usually store massive data, such as from various databases and warehouses, for management and information collection. However, diversified data sources, hybrid dataset structures, and scattered data storage rise the development cost for cross-source query and prolong the cross-source query duration. + +HetuEngine provides unified standard SQL statements to implement cross-source collaborative analysis, simplifying cross-source analysis operations. + + +.. figure:: /_static/images/en-us_image_0000001349059629.png + :alt: **Figure 1** HetuEngine cross-source function + + **Figure 1** HetuEngine cross-source function + +Key Technologies and Advantages of the HetuEngine Cross-Source Function +----------------------------------------------------------------------- + +- Computing pushdown: When HetuEngine is used for cross-source collaborative analysis, HetuEngine enhances the computing pushdown capability from the dimensions listed in the following table to improve access efficiency. + + .. table:: **Table 1** Dimensions of HetuEngine computing pushdown + + ======================= ========== + Type Content + ======================= ========== + Basic Pushed Down Predicate + \ Projection + \ Sub-query + \ Limit + Aggregation Pushed Down Group by + \ Order by + \ Count + \ Sum + \ Min + \ Max + Operator Pushed Down <, > + \ Like + \ Or + ======================= ========== + +- Multi-source heterogeneous data: Collaborative analysis supports both structured data sources such as Hive and GaussDB and unstructured data sources such as HBase and Elasticsearch. +- Global metadata: A mapping table is provided to map unstructured schemas to structured schemas, enabling HetuEngine to access HBase using SQL statements. Global management for data source information is provided. +- Global permission control: Data source permissions can be opened to Ranger through HetuEngine for centralized management and control. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_cross-source_function/usage_guide_of_hetuengine_cross-source_function.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_cross-source_function/usage_guide_of_hetuengine_cross-source_function.rst new file mode 100644 index 0000000..e36569d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_the_hetuengine_cross-source_function/usage_guide_of_hetuengine_cross-source_function.rst @@ -0,0 +1,73 @@ +:original_name: mrs_01_2341.html + +.. _mrs_01_2341: + +Usage Guide of HetuEngine Cross-Source Function +=============================================== + +#. Register the data source by referring to :ref:`Configuring a HetuEngine Data Source `. + +#. If the HBase data source is used, you need to create a structured mapping table. + + - The format of the statement for creating a mapping table is as follows: + + .. code-block:: + + CREATE TABLE schemaName.tableName ( + rowId VARCHAR, + qualifier1 TINYINT, + qualifier2 SMALLINT, + qualifier3 INTEGER, + qualifier4 BIGINT, + qualifier5 DOUBLE, + qualifier6 BOOLEAN, + qualifier7 TIME, + qualifier8 DATE, + qualifier9 TIMESTAMP + ) + WITH ( + column_mapping = 'qualifier1:f1:q1,qualifier2:f1:q2,qualifier3:f2:q3,qualifier4:f2:q4,qualifier5:f2:q5,qualifier6:f3:q1,qualifier7:f3:q2,qualifier8:f3:q3,qualifier9:f3:q4', + row_id = 'rowId', + hbase_table_name = 'hbaseNamespace:hbaseTable', + external = true + ); + -- Note: The value of schemaName must be the same as the value of hbaseNamespace in hbase_table_name. Otherwise, the table fails to be created. + + - Supported mapping tables: Mapping tables can be directly associated with tables in the HBase data source or created and associated with new tables that do not exist in the HBase data source. + + - Supported data types in a mapping table: VARCHAR, TINYINT, SMALLINT, INTEGER, BIGINT, DOUBLE, BOOLEAN, TIME, DATE, and TIMESTAMP + + - The following table describes the keywords in the statements for creating mapping tables. + + .. table:: **Table 1** Keywords in the statements for creating mapping tables + + +------------------+---------+-----------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Keyword | Type | Mandatory | Default Value | Remarks | + +==================+=========+===========+======================================================+=========================================================================================================================================================================================================================================================================================================================================================================+ + | column_mapping | String | No | All columns belong to the same Family column family. | Specify the mapping between columns in the mapping table and column families in the HBase data source table. If a table in the HBase data source needs to be associated, the value of **column_mapping** must be the same as that in the HBase data source. If you create a table that does not exist in the HBase data source, you need to specify **column_mapping**. | + +------------------+---------+-----------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | row_id | String | No | First column in the mapping table | Column name corresponding to the rowkey table in the HBase data source | + +------------------+---------+-----------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hbase_table_name | String | No | Null | Tablespace and table name of the HBase data source to be associated. Use a colon (:) to separate them. The default tablespace is **default**. If a new table that does not exist in the HBase data source is created, **hbase_table_name** does not need to be specified. | + +------------------+---------+-----------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | external | Boolean | No | true | If **external** is set to **true**, the table is a mapping table in the HBase data source and the original table in the HBase data source cannot be deleted. If **external** is set to false, the table in the HBase data source is deleted when the **Hetu-HBase** table is deleted. | + +------------------+---------+-----------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Use cross-source collaborative analysis. + + .. code-block:: + + // 1. Register three types of data sources, including Hive, Elasticsearch, and GaussDB A. + hetuengine> show catalogs; + Catalog + ---------- + dws + es + hive + hive_dg + system + systemremote + (6 rows) + + // 2. Compile SQL statements for cross-source collaborative analysis. + select * from hive_dg.schema1.table1 t1 join es.schema3.table3 t2 join dws.schema02.table4 t3 on t1.name = t2.item and t2.id = t3.cardNo; diff --git a/doc/component-operation-guide-lts/source/using_hive/access_control_of_a_dynamic_table_view_on_hive.rst b/doc/component-operation-guide-lts/source/using_hive/access_control_of_a_dynamic_table_view_on_hive.rst new file mode 100644 index 0000000..e051b23 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/access_control_of_a_dynamic_table_view_on_hive.rst @@ -0,0 +1,38 @@ +:original_name: mrs_01_0959.html + +.. _mrs_01_0959: + +Access Control of a Dynamic Table View on Hive +============================================== + +Scenario +-------- + +This section describes how to create a view on Hive when MRS is configured in security mode, authorize access permissions to different users, and specify that different users access different data. + +In the view, Hive can obtain the built-in function **current_user()** of the users who submit tasks on the client and filter the users. This way, authorized users can only access specific data in the view. + +.. note:: + + In normal mode, the **current_user()** function cannot distinguish users who submit tasks on the client. Therefore, the access control function takes effect only for Hive in security mode. + + If the **current_user()** function is used in the actual service logic, the possible risks must be fully evaluated during the conversion between the security mode and normal mode. + +Operation Example +----------------- + +- If the current_user function is not used, different views need to be created for different users to access different data. + + - Authorize the view **v1** permission to user **hiveuser1**. The user **hiveuser1** can access data with **type** set to **hiveuser1** in **table1**. + + **create view v1 as select \* from table1 where type='hiveuser1'** + + - Authorize the view **v2** permission to user **hiveuser2**. The user **hiveuser2** can access data with **type** set to **hiveuser2** in **table1**. + + **create view v2 as select \* from table1 where type='hiveuser2'** + +- If the current_user function is used, only one view needs to be created. + + Authorize the view **v** permission to users **hiveuser1** and **hiveuser2**. When user **hiveuser1** queries view **v**, the current_user() function is automatically converted to **hiveuser1**. When user **hiveuser2** queries view **v**, the **current_user()** function is automatically converted to **hiveuser2**. + + **create view v as select \* from table1 where type=current_user()** diff --git a/doc/component-operation-guide-lts/source/using_hive/authorizing_over_32_roles_in_hive.rst b/doc/component-operation-guide-lts/source/using_hive/authorizing_over_32_roles_in_hive.rst new file mode 100644 index 0000000..9af452b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/authorizing_over_32_roles_in_hive.rst @@ -0,0 +1,25 @@ +:original_name: mrs_01_0972.html + +.. _mrs_01_0972: + +Authorizing Over 32 Roles in Hive +================================= + +Scenario +-------- + +This function applies to Hive. + +The number of OS user groups is limited, and the number of roles that can be created in Hive cannot exceed 32. After this function is enabled, more than 32 roles can be created in Hive. + +.. note:: + + - After this function is enabled and the table or database is authorized, roles that have the same permission on the table or database will be combined using vertical bars (|). When the ACL permission is queried, the combined result is displayed, which is different from that before the function is enabled. This operation is irreversible. Determine whether to make adjustment based on the actual application scenario. + - If the current component uses Ranger for permission control, you need to configure related policies based on Ranger for permission management. For details, see :ref:`Adding a Ranger Access Permission Policy for Hive `. + - After this function is enabled, a maximum of 512 roles (including **owner**) are supported by default. The number is controlled by the user-defined parameter **hive.supports.roles.max** of MetaStore. You can change the value based on the actual application scenario. + +Procedure +--------- + +#. Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > **Services** > **Hive** > **Configurations** > **All Configurations**. +#. Choose **MetaStore(Role)** > **Customization**, add a customized parameter to the **hivemetastore-site.xml** parameter file, set **Name** to **hive.supports.over.32.roles**, and set **Value** to **true**. Restart all Hive instances after the modification. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/description_of_hive_table_location_either_be_an_obs_or_hdfs_path.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/description_of_hive_table_location_either_be_an_obs_or_hdfs_path.rst new file mode 100644 index 0000000..46f7712 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/description_of_hive_table_location_either_be_an_obs_or_hdfs_path.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_1763.html + +.. _mrs_01_1763: + +Description of Hive Table Location (Either Be an OBS or HDFS Path) +================================================================== + +Question +-------- + +Does a Hive Table Can Be Stored Either in OBS or HDFS? + +Answer +------ + +#. The location of a common Hive table stored on OBS can be set to an HDFS path. +#. In the same Hive service, you can create tables stored in OBS and HDFS, respectively. +#. For a Hive partitioned table stored on OBS, the location of the partition cannot be set to an HDFS path. (For a partitioned table stored on HDFS, the location of the partition cannot be changed to OBS.) diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/error_reported_when_the_where_condition_is_used_to_query_tables_with_excessive_partitions_in_fusioninsight_hive.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/error_reported_when_the_where_condition_is_used_to_query_tables_with_excessive_partitions_in_fusioninsight_hive.rst new file mode 100644 index 0000000..06aa74a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/error_reported_when_the_where_condition_is_used_to_query_tables_with_excessive_partitions_in_fusioninsight_hive.rst @@ -0,0 +1,25 @@ +:original_name: mrs_01_1761.html + +.. _mrs_01_1761: + +Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive +=============================================================================================================== + +Question +-------- + +When a table with more than 32,000 partitions is created in Hive, an exception occurs during the query with the WHERE partition. In addition, the exception information printed in **metastore.log** contains the following information: + +.. code-block:: + + Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 32970 + at org.postgresql.core.PGStream.SendInteger2(PGStream.java:199) + at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1330) + at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1601) + at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1191) + at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:346) + +Answer +------ + +During a query with partition conditions, HiveServer optimizes the partitions to avoid full table scanning. All partitions whose metadata meets the conditions need to be queried. However, the **sendOneQuery** interface provided by GaussDB limits the parameter value to **32767** in the **sendParse** method. If the number of partition conditions exceeds **32767**, an exception occurs. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/hive_configuration_problems.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/hive_configuration_problems.rst new file mode 100644 index 0000000..079cb96 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/hive_configuration_problems.rst @@ -0,0 +1,56 @@ +:original_name: mrs_01_24117.html + +.. _mrs_01_24117: + +Hive Configuration Problems +=========================== + +- The error message "java.lang.OutOfMemoryError: Java heap space." is displayed during Hive SQL execution. + + Solution: + + - For MapReduce tasks, increase the values of the following parameters: + + **set mapreduce.map.memory.mb=8192;** + + **set mapreduce.map.java.opts=-Xmx6554M;** + + **set mapreduce.reduce.memory.mb=8192;** + + **set mapreduce.reduce.java.opts=-Xmx6554M;** + + - For Tez tasks, increase the value of the following parameter: + + **set hive.tez.container.size=8192;** + +- After a column name is changed to a new one using the Hive SQL **as** statement, the error message "Invalid table alias or column reference 'xxx'." is displayed when the original column name is used for compilation. + + Solution: Run the **set hive.cbo.enable=true;** statement. + +- The error message "Unsupported SubQuery Expression 'xxx': Only SubQuery expressions that are top level conjuncts are allowed." is displayed during Hive SQL subquery compilation. + + Solution: Run the **set hive.cbo.enable=true;** statement. + +- The error message "CalciteSubquerySemanticException [Error 10249]: Unsupported SubQuery Expression Currently SubQuery expressions are only allowed as Where and Having Clause predicates." is displayed during Hive SQL subquery compilation. + + Solution: Run the **set hive.cbo.enable=true;** statement. + +- The error message "Error running query: java.lang.AssertionError: Cannot add expression of different type to set." is displayed during Hive SQL compilation. + + Solution: Run the **set hive.cbo.enable=false;** statement. + +- The error message "java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFNumericStatsEvaluator.init." is displayed during Hive SQL execution. + + Solution: Run the **set hive.map.aggr=false;** statement. + +- When **hive.auto.convert.join** is set to **true** (enabled by default) and **hive.optimize.skewjoin** is set to **true**, the error message "ClassCastException org.apache.hadoop.hive.ql.plan.ConditionalWork cannot be cast to org.apache.hadoop.hive.ql.plan.MapredWork" is displayed. + + Solution: Run the **set hive.optimize.skewjoin=false;** statement. + +- When **hive.auto.convert.join** is set to **true** (enabled by default), **hive.optimize.skewjoin** is set to **true**, and **hive.exec.parallel** is set to **true**, the error message "java.io.FileNotFoundException: File does not exist:xxx/reduce.xml" is displayed. + + Solution: + + - Method 1: Switch the execution engine to Tez. For details, see :ref:`Switching the Hive Execution Engine to Tez `. + - Method 2: Run the **set hive.exec.parallel=false;** statement. + - Method 3: Run the **set hive.auto.convert.join=false;** statement. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_delete_udfs_on_multiple_hiveservers_at_the_same_time.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_delete_udfs_on_multiple_hiveservers_at_the_same_time.rst new file mode 100644 index 0000000..83437b8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_delete_udfs_on_multiple_hiveservers_at_the_same_time.rst @@ -0,0 +1,72 @@ +:original_name: mrs_01_1753.html + +.. _mrs_01_1753: + +How Do I Delete UDFs on Multiple HiveServers at the Same Time? +============================================================== + +Question +-------- + +How can I delete permanent user-defined functions (UDFs) on multiple HiveServers at the same time? + +Answer +------ + +Multiple HiveServers share one MetaStore database. Therefore, there is a delay in the data synchronization between the MetaStore database and the HiveServer memory. If a permanent UDF is deleted from one HiveServer, the operation result cannot be synchronized to the other HiveServers promptly. + +In this case, you need to log in to the Hive client to connect to each HiveServer and delete permanent UDFs on the HiveServers one by one. The operations are as follows: + +#. Log in to the node where the Hive client is installed as the Hive client installation user. + +#. Run the following command to go to the client installation directory: + + **cd** *Client installation directory* + + For example, if the client installation directory is **/opt/client**, run the following command: + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. Run the following command to authenticate the user: + + **kinit** *Hive service user* + + .. note:: + + The login user must have the Hive admin rights. + +#. .. _mrs_01_1753__en-us_topic_0000001219029535_l7ef35cc9f4de4ef9966a1cda923d47e5: + + Run the following command to connect to the specified HiveServer: + + **beeline -u "jdbc:hive2://**\ *10.39.151.74*\ **:**\ *21066*\ **/default;sasl.qop=auth-conf;auth=KERBEROS;principal=**\ *hive*/*hadoop.@*" + + .. note:: + + - *10.39.151.74* is the IP address of the node where the HiveServer is located. + - *21066* is the port number of the HiveServer. The HiveServer port number ranges from 21066 to 21070 by default. Use the actual port number. + - *hive* is the username. For example, if the Hive1 instance is used, the username is **hive1**. + - You can log in to FusionInsight Manager, choose **System** > **Permission** > **Domain and Mutual Trust**, and view the value of **Local Domain**, which is the current system domain name. + - **hive/hadoop.\ **** is the username. All letters in the system domain name contained in the username are lowercase letters. + +#. Run the following command to enable the Hive admin rights: + + **set role admin;** + +#. Run the following command to delete the permanent UDF: + + **drop function** *function_name*\ **;** + + .. note:: + + - *function_name* indicates the name of the permanent function. + - If the permanent UDF is created in Spark, the permanent UDF needs to be deleted from Spark and then from HiveServer by running the preceding command. + +#. Check whether the permanent UDFs are deleted from all HiveServers. + + - If yes, no further action is required. + - If no, go to :ref:`5 `. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_forcibly_stop_mapreduce_jobs_executed_by_hive.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_forcibly_stop_mapreduce_jobs_executed_by_hive.rst new file mode 100644 index 0000000..75cc637 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_forcibly_stop_mapreduce_jobs_executed_by_hive.rst @@ -0,0 +1,19 @@ +:original_name: mrs_01_1756.html + +.. _mrs_01_1756: + +How Do I Forcibly Stop MapReduce Jobs Executed by Hive? +======================================================= + +Question +-------- + +How do I stop a MapReduce task manually if the task is suspended for a long time? + +Answer +------ + +#. Log in to FusionInsight Manager. +#. Choose **Cluster** > *Name of the desired cluster* > **Services** > **Yarn**. +#. On the left pane, click **ResourceManager(Host name, Active)**, and log in to Yarn. +#. Click the button corresponding to the task ID. On the task page that is displayed, click **Kill Application** in the upper left corner and click **OK** in the displayed dialog box to stop the task. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_monitor_the_hive_table_size.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_monitor_the_hive_table_size.rst new file mode 100644 index 0000000..203d1a5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_monitor_the_hive_table_size.rst @@ -0,0 +1,38 @@ +:original_name: mrs_01_1758.html + +.. _mrs_01_1758: + +How Do I Monitor the Hive Table Size? +===================================== + +Question +-------- + +How do I monitor the Hive table size? + +Answer +------ + +The HDFS refined monitoring function allows you to monitor the size of a specified table directory. + +Prerequisites +------------- + +- The Hive and HDFS components are running properly. +- The HDFS refined monitoring function is normal. + +Procedure +--------- + +#. Log in to FusionInsight Manager. + +#. Choose **Cluster** > *Name of the desired cluster* > **Services** > **HDFS** > **Resource**. + +#. Click the first icon in the upper left corner of **Resource Usage (by Directory)**, as shown in the following figure. + + |image1| + +4. In the displayed sub page for configuring space monitoring, click **Add**. +5. In the displayed **Add a Monitoring Directory** dialog box, set **Name** to the name or the user-defined alias of the table to be monitored and **Path** to the path of the monitored table. Click **OK**. In the monitoring result, the horizontal coordinate indicates the time, and the vertical coordinate indicates the size of the monitored directory. + +.. |image1| image:: /_static/images/en-us_image_0000001348739969.jpg diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_prevent_key_directories_from_data_loss_caused_by_misoperations_of_the_insert_overwrite_statement.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_prevent_key_directories_from_data_loss_caused_by_misoperations_of_the_insert_overwrite_statement.rst new file mode 100644 index 0000000..d451c4f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_do_i_prevent_key_directories_from_data_loss_caused_by_misoperations_of_the_insert_overwrite_statement.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_1759.html + +.. _mrs_01_1759: + +How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the **insert overwrite** Statement? +============================================================================================================== + +Question +-------- + +How do I prevent key directories from data loss caused by misoperations of the **insert overwrite** statement? + +Answer +------ + +During monitoring of key Hive databases, tables, or directories, to prevent data loss caused by misoperations of the **insert overwrite** statement, configure **hive.local.dir.confblacklist** in Hive to protect directories. + +This configuration item has been configured for directories such as **/opt/** and **/user/hive/warehouse** by default. + +Prerequisites +------------- + +The Hive and HDFS components are running properly. + +Procedure +--------- + +#. Log in to FusionInsight Manager. +#. Choose **Cluster** > *Name of the desired cluster* > **Services** > **Hive** > **Configurations** > **All Configurations**, and search for the **hive.local.dir.confblacklist** configuration item. + +3. Add paths of databases, tables, or directories to be protected in the parameter value. +4. Click **Save** to save the settings. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_to_perform_operations_on_local_files_with_hive_user-defined_functions.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_to_perform_operations_on_local_files_with_hive_user-defined_functions.rst new file mode 100644 index 0000000..015f9fb --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/how_to_perform_operations_on_local_files_with_hive_user-defined_functions.rst @@ -0,0 +1,31 @@ +:original_name: mrs_01_1755.html + +.. _mrs_01_1755: + +How to Perform Operations on Local Files with Hive User-Defined Functions +========================================================================= + +Question +-------- + +How to perform operations on local files (such as reading the content of a file) with Hive user-defined functions? + +Answer +------ + +By default, you can perform operations on local files with their relative paths in UDF. The following are sample codes: + +.. code-block:: + + public String evaluate(String text) { + // some logic + File file = new File("foo.txt"); + // some logic + // do return here + } + +In Hive, upload the file **foo.txt** used in UDF to HDFS, such as **hdfs://hacluster/tmp/foo.txt**. You can perform operations on the **foo.txt** file by creating UDF with the following sentences: + +**create function testFunc as 'some.class' using jar 'hdfs://hacluster/somejar.jar', file 'hdfs://hacluster/tmp/foo.txt';** + +In abnormal cases, if the value of **hive.fetch.task.conversion** is **more**, you can perform operations on local files in UDF by using absolute path instead of relative path. In addition, you must ensure that the file exists on all HiveServer nodes and NodeManager nodes and **omm** user have corresponding operation rights. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/index.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/index.rst new file mode 100644 index 0000000..21be1e2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/index.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_1752.html + +.. _mrs_01_1752: + +Common Issues About Hive +======================== + +- :ref:`How Do I Delete UDFs on Multiple HiveServers at the Same Time? ` +- :ref:`Why Cannot the DROP operation Be Performed on a Backed-up Hive Table? ` +- :ref:`How to Perform Operations on Local Files with Hive User-Defined Functions ` +- :ref:`How Do I Forcibly Stop MapReduce Jobs Executed by Hive? ` +- :ref:`How Do I Monitor the Hive Table Size? ` +- :ref:`How Do I Prevent Key Directories from Data Loss Caused by Misoperations of the insert overwrite Statement? ` +- :ref:`Why Is Hive on Spark Task Freezing When HBase Is Not Installed? ` +- :ref:`Error Reported When the WHERE Condition Is Used to Query Tables with Excessive Partitions in FusionInsight Hive ` +- :ref:`Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client? ` +- :ref:`Description of Hive Table Location (Either Be an OBS or HDFS Path) ` +- :ref:`Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements? ` +- :ref:`Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition? ` +- :ref:`Why Does Hive Not Support Vectorized Query? ` +- :ref:`Hive Configuration Problems ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + how_do_i_delete_udfs_on_multiple_hiveservers_at_the_same_time + why_cannot_the_drop_operation_be_performed_on_a_backed-up_hive_table + how_to_perform_operations_on_local_files_with_hive_user-defined_functions + how_do_i_forcibly_stop_mapreduce_jobs_executed_by_hive + how_do_i_monitor_the_hive_table_size + how_do_i_prevent_key_directories_from_data_loss_caused_by_misoperations_of_the_insert_overwrite_statement + why_is_hive_on_spark_task_freezing_when_hbase_is_not_installed + error_reported_when_the_where_condition_is_used_to_query_tables_with_excessive_partitions_in_fusioninsight_hive + why_cannot_i_connect_to_hiveserver_when_i_use_ibm_jdk_to_access_the_beeline_client + description_of_hive_table_location_either_be_an_obs_or_hdfs_path + why_cannot_data_be_queried_after_the_mapreduce_engine_is_switched_after_the_tez_engine_is_used_to_execute_union-related_statements + why_does_hive_not_support_concurrent_data_writing_to_the_same_table_or_partition + why_does_hive_not_support_vectorized_query + hive_configuration_problems diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_cannot_data_be_queried_after_the_mapreduce_engine_is_switched_after_the_tez_engine_is_used_to_execute_union-related_statements.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_cannot_data_be_queried_after_the_mapreduce_engine_is_switched_after_the_tez_engine_is_used_to_execute_union-related_statements.rst new file mode 100644 index 0000000..364ae22 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_cannot_data_be_queried_after_the_mapreduce_engine_is_switched_after_the_tez_engine_is_used_to_execute_union-related_statements.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_2309.html + +.. _mrs_01_2309: + +Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements? +=================================================================================================================================== + +Question +-------- + +Hive uses the Tez engine to execute union-related statements to write data. After Hive is switched to the MapReduce engine for query, no data is found. + +Answer +------ + +When Hive uses the Tez engine to execute the union-related statement, the generated output file is stored in the **HIVE_UNION_SUBDIR** directory. After Hive is switched back to the MapReduce engine, files in the directory are not read by default. Therefore, data in the **HIVE_UNION_SUBDIR** directory is not read. + +In this case, you can set **mapreduce.input.fileinputformat.input.dir.recursive** to **true** to enable union optimization and determine whether to read data in the directory. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_cannot_i_connect_to_hiveserver_when_i_use_ibm_jdk_to_access_the_beeline_client.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_cannot_i_connect_to_hiveserver_when_i_use_ibm_jdk_to_access_the_beeline_client.rst new file mode 100644 index 0000000..c428d83 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_cannot_i_connect_to_hiveserver_when_i_use_ibm_jdk_to_access_the_beeline_client.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_1762.html + +.. _mrs_01_1762: + +Why Cannot I Connect to HiveServer When I Use IBM JDK to Access the Beeline Client? +=================================================================================== + +Scenario +-------- + +When users check the JDK version used by the client, if the JDK version is IBM JDK, the Beeline client needs to be reconstructed. Otherwise, the client will fail to connect to HiveServer. + +Procedure +--------- + +#. Log in to FusionInsight Manager and choose **System** > **Permission** > **User**. In the **Operation** column of the target user, choose **More** > **Download Authentication Credential**, select the cluster information, and click **OK** to download the keytab file. + +#. Decompress the keytab file and use WinSCP to upload the decompressed **user.keytab** file to the Hive client installation directory on the node to be operated, for example, **/opt/client**. + +#. Run the following command to open the **Hive/component_env** configuration file in the Hive client directory: + + **vi** *Hive client installation directory*\ **/Hive/component_env** + + Add the following content to the end of the line where **export CLIENT_HIVE_URI** is located: + + .. code-block:: + + \; user.principal=Username @HADOOP.COM\;user.keytab=user.keytab file path/user.keytab diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_cannot_the_drop_operation_be_performed_on_a_backed-up_hive_table.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_cannot_the_drop_operation_be_performed_on_a_backed-up_hive_table.rst new file mode 100644 index 0000000..eb45b33 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_cannot_the_drop_operation_be_performed_on_a_backed-up_hive_table.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_1754.html + +.. _mrs_01_1754: + +Why Cannot the DROP operation Be Performed on a Backed-up Hive Table? +===================================================================== + +Question +-------- + +Why cannot the **DROP** operation be performed for a backed up Hive table? + +Answer +------ + +Snapshots have been created for an HDFS directory mapping to the backed up Hive table, so the HDFS directory cannot be deleted. As a result, the Hive table cannot be deleted. + +When a Hive table is being backed up, snapshots are created for the HDFS directory mapping to the table. The snapshot mechanism of HDFS has the following limitation: If snapshots have been created for an HDFS directory, the directory cannot be deleted or renamed unless the snapshots are deleted. When the **DROP** operation is performed for a Hive table (except the EXTERNAL table), the system attempts to delete the HDFS directory mapping to the table. If the directory fails to be deleted, the system displays a message indicating that the table fails to be deleted. + +If you need to delete this table, manually delete all backup tasks related to this table. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_does_hive_not_support_concurrent_data_writing_to_the_same_table_or_partition.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_does_hive_not_support_concurrent_data_writing_to_the_same_table_or_partition.rst new file mode 100644 index 0000000..edf4a92 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_does_hive_not_support_concurrent_data_writing_to_the_same_table_or_partition.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_2310.html + +.. _mrs_01_2310: + +Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition? +================================================================================= + +Question +-------- + +Why Does Data Inconsistency Occur When Data Is Concurrently Written to a Hive Table Through an API? + +Answer +------ + +Hive does not support concurrent data insertion for the same table or partition. As a result, multiple tasks perform operations on the same temporary data directory, and one task moves the data of another task, causing task data exception. The service logic is modified so that data is inserted to the same table or partition in single thread mode. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_does_hive_not_support_vectorized_query.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_does_hive_not_support_vectorized_query.rst new file mode 100644 index 0000000..a6afdd5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_does_hive_not_support_vectorized_query.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_2325.html + +.. _mrs_01_2325: + +Why Does Hive Not Support Vectorized Query? +=========================================== + +Question +-------- + +When the vectorized parameter **hive.vectorized.execution.enabled** is set to **true**, why do some null pointers or type conversion exceptions occur occasionally when Hive on Tez/MapReduce/Spark is executed? + +Answer +------ + +Currently, Hive does not support vectorized execution. Many community issues are introduced during vectorized execution and are not resolved stably. The default value of **hive.vectorized.execution.enabled** is **false**. You are advised not to set this parameter to **true**. diff --git a/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_is_hive_on_spark_task_freezing_when_hbase_is_not_installed.rst b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_is_hive_on_spark_task_freezing_when_hbase_is_not_installed.rst new file mode 100644 index 0000000..7881fe4 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/common_issues_about_hive/why_is_hive_on_spark_task_freezing_when_hbase_is_not_installed.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_1760.html + +.. _mrs_01_1760: + +Why Is Hive on Spark Task Freezing When HBase Is Not Installed? +=============================================================== + +Scenario +-------- + +This function applies to Hive. + +Perform the following operations to configure parameters. When Hive on Spark tasks are executed in the environment where the HBase is not installed, freezing of tasks can be prevented. + +.. note:: + + The Spark kernel version of Hive on Spark tasks has been upgraded to Spark2x. Hive on Spark tasks can be executed is Spark2x is not installed. If HBase is not installed, when Spark tasks are executed, the system attempts to connect to the ZooKeeper to access HBase until timeout occurs by default. As a result, task freezing occurs. + + If HBase is not installed, perform the following operations to execute Hive on Spark tasks. If HBase is upgraded from an earlier version, you do not need to configure parameters after the upgrade. + +Procedure +--------- + +#. Log in to FusionInsight Manager. +#. Choose **Cluster** > *Name of the desired cluster* > **Services** > **Hive** > **Configurations** > **All Configurations**. +#. Choose **HiveServer(Role)** > **Customization**. Add a customized parameter to the **spark-defaults.conf** parameter file. Set **Name** to **spark.security.credentials.hbase.enabled**, and set **Value** to **false**. +#. Click **Save**. In the dialog box that is displayed, click **OK**. +#. Choose **Cluster** > *Name of the desired cluster* > **Services** > **Hive** > **Instance**, select all Hive instances, choose **More** > **Restart Instance**, enter the password, and click **OK**. diff --git a/doc/component-operation-guide-lts/source/using_hive/configuring_hive_parameters.rst b/doc/component-operation-guide-lts/source/using_hive/configuring_hive_parameters.rst new file mode 100644 index 0000000..a0a57e3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/configuring_hive_parameters.rst @@ -0,0 +1,41 @@ +:original_name: mrs_01_0582.html + +.. _mrs_01_0582: + +Configuring Hive Parameters +=========================== + +Navigation Path +--------------- + +Go to the Hive configurations page by referring to :ref:`Modifying Cluster Service Configuration Parameters `. + +Parameter Description +--------------------- + +.. table:: **Table 1** Hive parameter description + + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ + | Parameter | Description | Default Value | + +========================================+======================================================================================================================================================================================================================================================================================================================================+=================================+ + | hive.auto.convert.join | Whether Hive converts common **join** to **mapjoin** based on the input file size. | Possible values are as follows: | + | | | | + | | When Hive is used to query a join table, whatever the table size is (if the data in the join table is less than 24 MB, it is a small one), set this parameter to **false**. If this parameter is set to **true**, new **mapjoin** cannot be generated when you query a join table. | - true | + | | | - false | + | | | | + | | | The default value is **true**. | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ + | hive.default.fileformat | Indicates the default file format used by Hive. | RCFile | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ + | hive.exec.reducers.max | Indicates the maximum number of reducers in a MapReduce job submitted by Hive. | 999 | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ + | hive.server2.thrift.max.worker.threads | Indicates the maximum number of threads that can be started in the HiveServer internal thread pool. | 1,000 | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ + | hive.server2.thrift.min.worker.threads | Indicates the number of threads started during initialization in the HiveServer internal thread pool. | 5 | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ + | hive.hbase.delete.mode.enabled | Indicates whether to enable the function of deleting HBase records from Hive. If this function is enabled, you can use **remove table xx where xxx** to delete HBase records from Hive. | true | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ + | hive.metastore.server.min.threads | Indicates the number of threads started by MetaStore for processing connections. If the number of threads is more than the set value, MetaStore always maintains a number of threads that is not lower than the set value, that is, the number of resident threads in the MetaStore thread pool is always higher than the set value. | 200 | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ + | hive.server2.enable.doAs | Indicates whether to simulate client users during sessions between HiveServer2 and other services (such as Yarn and HDFS). If you change the configuration item from **false** to **true**, users with only the column permission lose the permissions to access corresponding tables. | true | + +----------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hive/configuring_https_http-based_rest_apis.rst b/doc/component-operation-guide-lts/source/using_hive/configuring_https_http-based_rest_apis.rst new file mode 100644 index 0000000..faa4eb3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/configuring_https_http-based_rest_apis.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_0957.html + +.. _mrs_01_0957: + +Configuring HTTPS/HTTP-based REST APIs +====================================== + +Scenario +-------- + +WebHCat provides external REST APIs for Hive. By default, the open-source community version uses the HTTP protocol. + +MRS Hive supports the HTTPS protocol that is more secure, and enables switchover between the HTTP protocol and the HTTPS protocol. + +.. note:: + + The security mode supports HTTPS and HTTP, and the common mode supports only HTTP. + +Procedure +--------- + +#. The Hive service configuration page is displayed. + + Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. And choose **Cluster** > *Name of the desired cluster* > **Services** > **Hive** > **Configurations** > **All Configurations**. + +#. Modify the Hive configuration. + + Choose **WebHCat** > **Security**. On the page that is displayed, select **HTTPS** or **HTTP**. After the modification, restart the Hive service to use the corresponding protocol. diff --git a/doc/component-operation-guide-lts/source/using_hive/connecting_hive_with_external_rds.rst b/doc/component-operation-guide-lts/source/using_hive/connecting_hive_with_external_rds.rst new file mode 100644 index 0000000..0ce8b46 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/connecting_hive_with_external_rds.rst @@ -0,0 +1,118 @@ +:original_name: mrs_01_1751.html + +.. _mrs_01_1751: + +Connecting Hive with External RDS +================================= + +.. note:: + + - RDS indicates the relational database in this section. This section describes how to connect Hive with the open-source MySQL and Postgres databases. + - After an external metadata database is deployed in a cluster with Hive data, the original metadata table will not be automatically synchronized. Therefore, before installing Hive, you need to check whether the metadata is external RDS or built-in DBService. If the metadata is external RDS, you need to configure the metadata to external RDS when installing Hive or when there is no Hive data. After the installation, the metadata cannot be modified. Otherwise, the original metadata will be lost. + +**Hive supports access to open source MySQL and Postgres metabases.** + +#. Install the open source MySQL or Postgres database. + + .. note:: + + The node where the database is installed must be in the same network segment as the cluster, so that they can access each other. + +#. Upload the driver package. + + - Postgres: Use the open source Postgres driver package to replace the existing one of the cluster. Upload the Postgres driver package **postgresql-42.2.5.jar** to the *${BIGDATA_HOME}*\ **/third_lib/Hive** directory on all MetaStore instance nodes. To download the open-source driver package, visit https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.5/. + - MySQL: Go to the MySQL official website (https://www.mysql.com/), select **Downloads** > **Community** > **MySQL Connectors** > **Connector/J** to download the driver package of the corresponding version, and upload the driver package to the *${BIGDATA_HOME}*\ **/third_lib/Hive** directory on all RDSMetastore nodes. + +#. Create a database in RDS as the specified database of Hive metadata. + + Run the following command in Postgres or MySQL: + + **create database** *databasename*\ **;** + + .. note:: + + In the preceding command, *databasename* indicates the database name. + +#. Import the SQL statements for creating metadata tables. + + - Path of the Postgres SQL file: **${BIGDATA_HOME}/FusionInsight_HD\_8.1.2.2/install/FusionInsight-Hive-3.1.0/hive-3.1.0/scripts/metastore/upgrade/postgres/hive-schema-3.1.0.postgres.sql** + + Run the following command to import the SQL file to Postgres: + + **./bin/psql -U** *username* **-d** *databasename* **-f hive-schema-3.1.0.postgres.sql** + + Specifically: + + **./bin/psql** is in the Postgres installation directory. + + **username** indicates the username for logging in to Postgres. + + **databasename** indicates the database name. + + - Path of the MySQL file: **${BIGDATA_HOME}/FusionInsight_HD\_8.1.2.2/install/FusionInsight-Hive-3.1.0/hive-3.1.0/scripts/metastore/upgrade/mysql/hive-schema-3.1.0.mysql.sql** + + Run the following command to import the SQL file to the MySQL database: + + **./bin/mysql -u** *username* **-p**\ *password* **-D**\ *databasename*\ **`. Choose **Cluster** > **Name of the desired cluster** > **Services** > **Hive** > **Configurations** > **All Configurations** > **Hive(Service)** > **MetaDB**, modify the following parameters, and click **Save**. + + .. table:: **Table 1** Parameter description + + +---------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=======================================+=================================================================================================+============================================================================================+ + | javax.jdo.option.ConnectionDriverName | org.postgresql.Driver | Driver class for connecting metadata on MetaStore | + | | | | + | | | - If an external MySQL database is used, the value is: | + | | | | + | | | **com.mysql.jdbc.Driver** | + | | | | + | | | - If an external Postgres database is used, the value is: | + | | | | + | | | **org.postgresql.Driver** | + +---------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | javax.jdo.option.ConnectionURL | jdbc:postgresql://%{DBSERVICE_FLOAT_IP}%{DBServer}:%{DBSERVICE_CPORT}/hivemeta?socketTimeout=60 | URL of the JDBC link of the MetaStore metadata | + | | | | + | | | - If an external MySQL database is used, the value is: | + | | | | + | | | **jdbc:mysql://**\ *MySQL IP address*:*MySQL port number*\ **/test? useSSL=false** | + | | | | + | | | - If an external Postgres database is used, the value is: | + | | | | + | | | **jdbc:postgresql://**\ *Postgres IP address*\ **:**\ *Postgres port number*\ **/test** | + +---------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + | javax.jdo.option.ConnectionUserName | hive${SERVICE_INDEX}${SERVICE_INDEX} | Username for connecting to the metadata database on Metastore | + +---------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+ + +#. Change the Postgres database password in MetaStore. Choose **Cluster** > **Name of the desired cluster** > **Services** > **Hive** > **Configurations** > **All Configurations** > **MetaStore(Role)** > **MetaDB**, modify the following parameters, and click **Save**. + + .. table:: **Table 2** Parameter description + + +--------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +============================================+===============+===========================================================================================================================+ + | javax.jdo.option.extend.ConnectionPassword | \*****\* | User password for connecting to the external metadata database on Metastore. The password is encrypted in the background. | + +--------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------+ + +#. Log in to each MetaStore background node and check whether the local directory **/opt/Bigdata/tmp** exists. + + - If yes, go to :ref:`8 `. + + - If no, run the following commands to create one: + + **mkdir -p /opt/Bigdata/tmp** + + **chmod 755 /opt/Bigdata/tmp** + +#. .. _mrs_01_1751__en-us_topic_0000001219350615_li24241321154318: + + Save the configuration. Choose **Dashboard** > **More** > **Restart Service**, and enter the password to restart the Hive service. diff --git a/doc/component-operation-guide-lts/source/using_hive/creating_databases_and_creating_tables_in_the_default_database_only_as_the_hive_administrator.rst b/doc/component-operation-guide-lts/source/using_hive/creating_databases_and_creating_tables_in_the_default_database_only_as_the_hive_administrator.rst new file mode 100644 index 0000000..6c2f66e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/creating_databases_and_creating_tables_in_the_default_database_only_as_the_hive_administrator.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_0969.html + +.. _mrs_01_0969: + +Creating Databases and Creating Tables in the Default Database Only as the Hive Administrator +============================================================================================= + +Scenario +-------- + +This function is applicable to Hive and Spark2x. + +After this function is enabled, only the Hive administrator can create databases and tables in the default database. Other users can use the databases only after being authorized by the Hive administrator. + +.. note:: + + - After this function is enabled, common users are not allowed to create a database or create a table in the default database. Based on the actual application scenario, determine whether to enable this function. + - Permissions of common users are restricted. In the scenario where common users have been used to perform operations, such as database creation, table script migration, and metadata recreation in an earlier version of database, the users can perform such operations on the database in the condition that this function is disabled temporarily after the database is migrated or after the cluster is upgraded. + +Procedure +--------- + +#. Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > **Services** > **Hive** > **Configurations** > **All Configurations**. +#. Choose **HiveServer(Role)** > **Customization**, add a customized parameter to the **hive-site.xml** parameter file, set **Name** to **hive.allow.only.admin.create**, and set **Value** to **true**. Restart all Hive instances after the modification. +#. Determine whether to enable this function on the Spark2x client. + + - If yes, download and install the Spark2x client again. + - If no, no further action is required. diff --git a/doc/component-operation-guide-lts/source/using_hive/creating_user-defined_hive_functions.rst b/doc/component-operation-guide-lts/source/using_hive/creating_user-defined_hive_functions.rst new file mode 100644 index 0000000..bcc9e9c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/creating_user-defined_hive_functions.rst @@ -0,0 +1,93 @@ +:original_name: mrs_01_0963.html + +.. _mrs_01_0963: + +Creating User-Defined Hive Functions +==================================== + +When built-in functions of Hive cannot meet requirements, you can compile user-defined functions (UDFs) and use them for query. + +According to implementation methods, UDFs are classified as follows: + +- Common UDFs: used to perform operations on a single data row and export a single data row. +- User-defined aggregating functions (UDAFs): used to input multiple data rows and export a single data row. +- User-defined table-generating functions (UDTFs): used to perform operations on a single data row and export multiple data rows. + +According to use methods, UDFs are classified as follows: + +- Temporary functions: used only in the current session and must be recreated after a session restarts. +- Permanent functions: used in multiple sessions. You do not need to create them every time a session restarts. + +.. note:: + + You need to properly control the memory and thread usage of variables in UDFs. Improper control may cause memory overflow or high CPU usage. + +The following uses AddDoublesUDF as an example to describe how to compile and use UDFs. + +Function +-------- + +AddDoublesUDF is used to add two or more floating point numbers. In this example, you can learn how to write and use UDFs. + +.. note:: + + - A common UDF must be inherited from **org.apache.hadoop.hive.ql.exec.UDF**. + - A common UDF must implement at least one **evaluate()**. The evaluate function supports overloading. + - To develop a customized function, you need to add the **hive-exec-3.1.0.jar** dependency package to the project. The package can be obtained from the Hive installation directory. + +How to Use +---------- + +#. Packing programs as **AddDoublesUDF.jar** on the client node, and upload the package to a specified directory in HDFS, for example, **/user/hive_examples_jars**. + + Both the user who creates the function and the user who uses the function must have the read permission on the file. + + The following are example statements: + + **hdfs dfs -put ./hive_examples_jars /user/hive_examples_jars** + + **hdfs dfs -chmod 777 /user/hive_examples_jars** + +#. Check the cluster authentication mode. + + - In security mode, log in to the beeline client as a user with the Hive management permission and run the following commands: + + **kinit** *Hive service user* + + **beeline** + + **set role admin;** + + - In common mode, run the following command: + + **beeline -n** *Hive service user* + +#. Define the function in HiveServer. Run the following SQL statement to create a permanent function: + + **CREATE FUNCTION** *addDoubles* **AS 'com.xxx.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/hive_examples_jars/AddDoublesUDF.jar\ ';** + + *addDoubles* indicates the function alias that is used for SELECT query. + + Run the following statement to create a temporary function: + + **CREATE TEMPORARY FUNCTION addDoubles AS 'com.xxx.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/hive_examples_jars/AddDoublesUDF.jar\ ';** + + - *addDoubles* indicates the function alias that is used for SELECT query. + - **TEMPORARY** indicates that the function is used only in the current session with the HiveServer. + +#. Run the following SQL statement to use the function on the HiveServer: + + **SELECT addDoubles(1,2,3);** + + .. note:: + + If an [Error 10011] error is displayed when you log in to the client again, run the **reload function;** command and then use this function. + +#. Run the following SQL statement to delete the function from the HiveServer: + + **DROP FUNCTION addDoubles;** + +Extended Applications +--------------------- + +None diff --git a/doc/component-operation-guide-lts/source/using_hive/customizing_row_separators.rst b/doc/component-operation-guide-lts/source/using_hive/customizing_row_separators.rst new file mode 100644 index 0000000..2e3da8f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/customizing_row_separators.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_0955.html + +.. _mrs_01_0955: + +Customizing Row Separators +========================== + +Scenario +-------- + +In most cases, a carriage return character is used as the row delimiter in Hive tables stored in text files, that is, the carriage return character is used as the terminator of a row during queries. However, some data files are delimited by special characters, and not a carriage return character. + +MRS Hive allows you to use different characters or character combinations to delimit rows of Hive text data. When creating a table, set **inputformat** to **SpecifiedDelimiterInputFormat**, and set the following parameter before search each time. Then the table data is queried by the specified delimiter. + +**set hive.textinput.record.delimiter='';** + +.. note:: + + The Hue component of the current version does not support the configuration of multiple separators when files are imported to a Hive table. + +Procedure +--------- + +#. Specify **inputFormat** and **outputFormat** when creating a table. + + **CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS]** *[db_name.]table_name* **[(**\ *col_name data_type* **[COMMENT** *col_comment*\ **],** *...*\ **)] [ROW FORMAT** *row_format*\ **] STORED AS inputformat 'org.apache.hadoop.hive.contrib.fileformat.SpecifiedDelimiterInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'** + +#. Specify the delimiter before search. + + **set hive.textinput.record.delimiter='!@!'** + + Hive will use '!@!' as the row delimiter. diff --git a/doc/component-operation-guide-lts/source/using_hive/deleting_single-row_records_from_hive_on_hbase.rst b/doc/component-operation-guide-lts/source/using_hive/deleting_single-row_records_from_hive_on_hbase.rst new file mode 100644 index 0000000..46170d0 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/deleting_single-row_records_from_hive_on_hbase.rst @@ -0,0 +1,29 @@ +:original_name: mrs_01_0956.html + +.. _mrs_01_0956: + +Deleting Single-Row Records from Hive on HBase +============================================== + +Scenario +-------- + +Due to the limitations of underlying storage systems, Hive does not support the ability to delete a single piece of table data. In Hive on HBase, MRS Hive supports the ability to delete a single piece of HBase table data. Using a specific syntax, Hive can delete one or more pieces of data from an HBase table. + +.. table:: **Table 1** Permissions required for deleting single-row records from the Hive on HBase table + + =========================== ========================== + Cluster Authentication Mode Required Permission + =========================== ========================== + Security mode SELECT, INSERT, and DELETE + Common mode None + =========================== ========================== + +Procedure +--------- + +#. To delete some data from an HBase table, run the following HQL statement: + + **remove table where ;** + + In the preceding information, ** specifies the filter condition of the data to be deleted. ** indicates the Hive on HBase table from which data is to be deleted. diff --git a/doc/component-operation-guide-lts/source/using_hive/disabling_of_specifying_the_location_keyword_when_creating_an_internal_hive_table.rst b/doc/component-operation-guide-lts/source/using_hive/disabling_of_specifying_the_location_keyword_when_creating_an_internal_hive_table.rst new file mode 100644 index 0000000..f9f22d5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/disabling_of_specifying_the_location_keyword_when_creating_an_internal_hive_table.rst @@ -0,0 +1,27 @@ +:original_name: mrs_01_0970.html + +.. _mrs_01_0970: + +Disabling of Specifying the location Keyword When Creating an Internal Hive Table +================================================================================= + +Scenario +-------- + +This function is applicable to Hive and Spark2x. + +After this function is enabled, the **location** keyword cannot be specified when a Hive internal table is created. Specifically, after a table is created, the table path following the location keyword is created in the default **\\warehouse** directory and cannot be specified to another directory. If the location is specified when the internal table is created, the creation fails. + +.. note:: + + After this function is enabled, the location keyword cannot be specified during the creation of a Hive internal table. The table creation statement is restricted. If a table that has been created in the database is not stored in the default directory **/warehouse**, the **location** keyword can still be specified when the database creation, table script migration, or metadata recreation operation is performed by disabling this function temporarily. + +Procedure +--------- + +#. Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > **Services** > **Hive** > **Configurations** > **All Configurations**. +#. Choose **HiveServer(Role)** > **Customization**, add a customized parameter to the **hive-site.xml** parameter file, set **Name** to **hive.internaltable.notallowlocation**, and set **Value** to **true**. Restart all Hive instances after the modification. +#. Determine whether to enable this function on the Spark2x client. + + - If yes, download and install the Spark2x client again. + - If no, no further action is required. diff --git a/doc/component-operation-guide-lts/source/using_hive/enabling_or_disabling_the_transform_function.rst b/doc/component-operation-guide-lts/source/using_hive/enabling_or_disabling_the_transform_function.rst new file mode 100644 index 0000000..5211e88 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/enabling_or_disabling_the_transform_function.rst @@ -0,0 +1,33 @@ +:original_name: mrs_01_0958.html + +.. _mrs_01_0958: + +Enabling or Disabling the Transform Function +============================================ + +Scenario +-------- + +The Transform function is not allowed by Hive of the open source version. + +MRS Hive supports the configuration of the Transform function. The function is disabled by default, which is the same as that of the open-source community version. + +Users can modify configurations of the Transform function to enable the function. However, security risks exist when the Transform function is enabled. + +.. note:: + + The Transform function can be disabled only in security mode. + +Procedure +--------- + +#. The Hive service configuration page is displayed. + + Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. And choose **Cluster** > *Name of the desired cluster* > **Services** > **Hive** > **Configurations** > **All Configurations**. + +#. Enter the parameter name in the search box, search for **hive.security.transform.disallow**, change the parameter value to **true** or **false**, and restart all HiveServer instances. + + .. note:: + + - If this parameter is set to **true**, the Transform function is disabled, which is the same as that in the open-source community version. + - If this parameter is set to **false**, the Transform function is enabled, which poses security risks. diff --git a/doc/component-operation-guide-lts/source/using_hive/enabling_the_function_of_creating_a_foreign_table_in_a_directory_that_can_only_be_read.rst b/doc/component-operation-guide-lts/source/using_hive/enabling_the_function_of_creating_a_foreign_table_in_a_directory_that_can_only_be_read.rst new file mode 100644 index 0000000..09f99cd --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/enabling_the_function_of_creating_a_foreign_table_in_a_directory_that_can_only_be_read.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_0971.html + +.. _mrs_01_0971: + +Enabling the Function of Creating a Foreign Table in a Directory That Can Only Be Read +====================================================================================== + +Scenario +-------- + +This function is applicable to Hive and Spark2x. + +After this function is enabled, the user or user group that has the read and execute permissions on a directory can create foreign tables in the directory without checking whether the current user is the owner of the directory. In addition, the directory of a foreign table cannot be stored in the default directory **\\warehouse**. In addition, do not change the permission of the directory during foreign table authorization. + +.. note:: + + After this function is enabled, the function of the foreign table changes greatly. Based on the actual application scenario, determine whether to enable this function. + +Procedure +--------- + +#. Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > **Services** > **Hive** > **Configurations** > **All Configurations**. +#. Choose **HiveServer(Role)** > **Customization**, add a customized parameter to the **hive-site.xml** parameter file, set **Name** to **hive.restrict.create.grant.external.table**, and set **Value** to **true**. +#. Choose **MetaStore(Role)** > **Customization**, add a customized parameter to the **hivemetastore-site.xml** parameter file, set **Name** to **hive.restrict.create.grant.external.table**, and set **Value** to **true**. Restart all Hive instances after the modification. +#. Determine whether to enable this function on the Spark2x client. + + - If yes, download and install the Spark2x client again. + - If no, no further action is required. diff --git a/doc/component-operation-guide-lts/source/using_hive/enhancing_beeline_reliability.rst b/doc/component-operation-guide-lts/source/using_hive/enhancing_beeline_reliability.rst new file mode 100644 index 0000000..c08e4ab --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/enhancing_beeline_reliability.rst @@ -0,0 +1,47 @@ +:original_name: mrs_01_0965.html + +.. _mrs_01_0965: + +Enhancing beeline Reliability +============================= + +Scenario +-------- + +- When the beeline client is disconnected due to network exceptions during the execution of a batch processing task, tasks submitted before beeline is disconnected can be properly executed in Hive. When you start the batch processing task again, the submitted tasks are not executed and tasks that are not executed are executed in sequence. +- When the HiveServer service breaks down due to some reasons during the execution of a batch processing task, Hive enables that the tasks that have been successfully executed are not executed again when the same batch processing task is started again. The execution starts from the task that has not been executed from the time when HiveServer2 breaks down. + +Example +------- + +#. Beeline is reconnected after being disconnection. + + Example: + + beeline -e "${*SQL*}" --hivevar batchid=\ *xxxxx* + +#. Beeline kills the running tasks. + + Example: + + beeline -e "" --hivevar batchid=\ *xxxxx* --hivevar kill=true + +#. Log in to the beeline client and start the mechanism of reconnection after disconnection. + + Log in to the beeline client and run the **set hivevar:batchid=**\ *xxxx* command. + + .. note:: + + Instructions: + + - *xxxx* indicates the batch ID of tasks submitted in the same batch using the beeline client. Batch IDs can be used to identify the task submission batch. If the batch ID is not contained when a task is submitted, this feature is not enabled. + - If the running SQL script depends on the data timeliness, you are advised not to enable the breakpoint reconnection mechanism. You can use a new batch ID to submit tasks. During reexecution of the scripts, some SQL statements have been executed and are not executed again. As a result, expired data is obtained. + - If some built-in time functions are used in the SQL script, it is recommended that you do not enable the breakpoint reconnection mechanism or the use of a new batch ID for each execution. The reason is the same as above. + - A SQL script contains one or more subtasks. If the logic for deleting and creating temporary tables exist in the SQL script, it is recommended that the logic for deleting temporary tables be placed at the end of the script. If the subtasks executed after the temporary table deletion task fail to be executed and the temporary table is used in the subtasks before the temporary table deletion task, when the SQL script is executed using the same batch ID for the next time, the compilation of the subtasks (excluding the task for creating the temporary table because the creation has been completed and is not executed again, and only compilation is allowed) executed before the temporary table deletion task fails because the temporary has been deleted. In this case, you are advised to use a new batch ID to execute the script. + + Parameter description: + + - **zk.cleanup.finished.job.interval**: indicates the interval for executing the cleanup task. The default interval is 60 seconds. + - **zk.cleanup.finished.job.outdated.threshold**: indicates the threshold of the node validity period. A node is generated for tasks in the same batch. The threshold is calculated from the end time of the execution of the current batch task. If the time exceeds 60 minutes, the node is deleted. + - **batch.job.max.retry.count**: indicates the maximum number of retry times of a batch task. If the number of retry times of a batch task exceeds the value of this parameter, the task execution record is deleted. The task will be executed from the first task when the task is started next time. The default value is **10**. + - **beeline.reconnect.zk.path**: indicates the root node for storing task execution progress. The default value for the Hive service is **/beeline**. diff --git a/doc/component-operation-guide-lts/source/using_hive/hive_log_overview.rst b/doc/component-operation-guide-lts/source/using_hive/hive_log_overview.rst new file mode 100644 index 0000000..1c85b23 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/hive_log_overview.rst @@ -0,0 +1,129 @@ +:original_name: mrs_01_0976.html + +.. _mrs_01_0976: + +Hive Log Overview +================= + +Log Description +--------------- + +**Log path**: The default save path of Hive logs is **/var/log/Bigdata/hive/**\ *role name*, the default save path of Hive1 logs is **/var/log/Bigdata/hive1/**\ *role name*, and the others follow the same rule. + +- HiveServer: **/var/log/Bigdata/hive/hiveserver** (run log) and **var/log/Bigdata/audit/hive/hiveserver** (audit log) +- MetaStore: **/var/log/Bigdata/hive/metastore** (run log) and **/var/log/Bigdata/audit/hive/metastore** (audit log) +- WebHCat: **/var/log/Bigdata/hive/webhcat** (run log) and **/var/log/Bigdata/audit/hive/webhcat** (audit log) + +**Log archive rule**: The automatic compression and archiving function of Hive is enabled. By default, when the size of a log file exceeds 20 MB (which is adjustable), the log file is automatically compressed. The naming rule of a compressed log file is as follows: <*Original log name*>-<*yyyy-mm-dd_hh-mm-ss*>.[*ID*].\ **log.zip** A maximum of 20 latest compressed files are reserved. The number of compressed files and compression threshold can be configured. + +.. table:: **Table 1** Hive log list + + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | Log Type | Log File Name | Description | + +=======================+============================================================================+=====================================================================================+ + | Run log | /hiveserver/hiveserver.out | Log file that records HiveServer running environment information. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /hiveserver/hive.log | Run log file of the HiveServer process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /hiveserver/hive-omm-**\ ``-``\ **-gc.log.\ ** | GC log file of the HiveServer process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /hiveserver/prestartDetail.log | Work log file before the HiveServer startup. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /hiveserver/check-serviceDetail.log | Log file that records whether the Hive service starts successfully | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /hiveserver/cleanupDetail.log | Cleanup log file about the HiveServer uninstallation | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /hiveserver/startDetail.log | Startup log file of the HiveServer process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /hiveserver/stopDetail.log | Shutdown log file of the HiveServer process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /hiveserver/localtasklog/omm\_\ **\ \_\ **.log | Run log file of the local Hive task. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /hiveserver/localtasklog/omm\_\ **\ \_\ **-gc.log.\ ** | GC log file of the local Hive task. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /metastore/metastore.log | Run log file of the MetaStore process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /metastore/hive-omm-**\ ``-``\ **-gc.log.\ ** | GC log file of the MetaStore process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /metastore/postinstallDetail.log | Work log file after the MetaStore installation. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /metastore/prestartDetail.log | Work log file before the MetaStore startup | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /metastore/cleanupDetail.log | Cleanup log file of the MetaStore uninstallation | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /metastore/startDetail.log | Startup log file of the MetaStore process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /metastore/stopDetail.log | Shutdown log file of the MetaStore process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /metastore/metastore.out | Log file that records MetaStore running environment information. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /webhcat/webhcat-console.out | Log file that records the normal start and stop of the WebHCat process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /webhcat/webhcat-console-error.out | Log file that records the start and stop exceptions of the WebHCat process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /webhcat/prestartDetail.log | Work log file before the WebHCat startup. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /webhcat/cleanupDetail.log | Cleanup logs generated during WebHCat uninstallation or before WebHCat installation | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /webhcat/hive-omm-<*Date*>--gc.log.<*No*.> | GC log file of the WebHCat process. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | /webhcat/webhcat.log | Run log file of the WebHCat process | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | Audit log | hive-audit.log | HiveServer audit log file | + | | | | + | | hive-rangeraudit.log | | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | metastore-audit.log | MetaStore audit log file. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | webhcat-audit.log | WebHCat audit log file. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + | | jetty-.request.log | Request logs of the jetty service. | + +-----------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------+ + +Log Levels +---------- + +:ref:`Table 2 ` describes the log levels supported by Hive. + +Levels of run logs are ERROR, WARN, INFO, and DEBUG from the highest to the lowest priority. Run logs of equal or higher levels are recorded. The higher the specified log level, the fewer the logs recorded. + +.. _mrs_01_0976__en-us_topic_0000001173471504_t91045e1a946a46b4bac39028af62f3ad: + +.. table:: **Table 2** Log levels + + +-------+------------------------------------------------------------------------------------------+ + | Level | Description | + +=======+==========================================================================================+ + | ERROR | Logs of this level record error information about system running. | + +-------+------------------------------------------------------------------------------------------+ + | WARN | Logs of this level record exception information about the current event processing. | + +-------+------------------------------------------------------------------------------------------+ + | INFO | Logs of this level record normal running status information about the system and events. | + +-------+------------------------------------------------------------------------------------------+ + | DEBUG | Logs of this level record the system information and system debugging information. | + +-------+------------------------------------------------------------------------------------------+ + +To modify log levels, perform the following operations: + +#. Go to the **All Configurations** page of the Yarn service by referring to :ref:`Modifying Cluster Service Configuration Parameters `. +#. On the menu bar on the left, select the log menu of the target role. +#. Select a desired log level and save the configuration. + + .. note:: + + The Hive log level takes effect immediately after being configured. You do not need to restart the service. + +Log Formats +----------- + +The following table lists the Hive log formats: + +.. table:: **Table 3** Log formats + + +-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Log Type | Format | Example | + +===========+=====================================================================================================================================================================+=====================================================================================================================================================================================================================================================================================+ + | Run log | |||| | 2014-11-05 09:45:01,242 \| INFO \| main \| Starting hive metastore on port 21088 \| org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5198) | + +-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Audit log | |||