A rolling restart is batch restarting all services in a cluster after they are modified or upgraded without interrupting workloads.
You can perform a rolling restart of a cluster as needed.
A rolling restart takes a longer time and may affect service throughput and performance.
Parameter |
Description |
---|---|
Restart only instances with expired configurations in the cluster |
Whether to restart only the modified instances in a cluster |
Enable rack strategy |
Whether to enable the concurrent rack rolling restart strategy. This parameter takes effect only for roles that meet the rack rolling restart strategy. (The roles support rack awareness, and instances of the roles belong to two or more racks.) NOTE:
This parameter is configurable only when a rolling restart is performed on HDFS or YARN. |
Data Nodes to Be Batch Restarted |
Number of instances that are restarted in each batch when the batch rolling restart strategy is used. The default value is 1. NOTE:
|
Batch Interval |
Interval between two batches of instances to be roll-restarted. The default value is 0. |
Decommissioning Timeout Interval |
Decommissioning interval for role instances during a rolling restart. The default value is 1800s. Some roles (such as HiveServer and JDBCServer) stop providing services before the rolling restart. Stopped instances cannot cannot be connected to new clients. Existing connections will be completed after a period of time. An appropriate timeout interval can ensure service continuity. NOTE:
This parameter is configurable only when a rolling restart is performed on Hive or Spark2x. |
Batch Fault Tolerance Threshold |
Tolerance times when the rolling restart of instances fails to be batch executed. The default value is 0, which indicates that the rolling restart task ends after any batch of instances fails to restart. |
Advanced parameters, such as Data Nodes to Be Batch Restarted, Batch Interval, and Batch Fault Tolerance Threshold, should be properly configured based on site requirements. Otherwise, services may be interrupted or cluster performance may be severely affected.
Example: