forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
47 lines
19 KiB
HTML
47 lines
19 KiB
HTML
<a name="mrs_01_1040"></a><a name="mrs_01_1040"></a>
|
|
|
|
<h1 class="topictitle1">Kafka Balancing Tool Instructions</h1>
|
|
<div id="body1590134593900"><div class="section" id="mrs_01_1040__section31432520151519"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1040__p40367418151531">This section describes how to use the Kafka balancing tool on a client to balance the load of the Kafka cluster based on service requirements in scenarios such as node decommissioning, node recommissioning, and load balancing. </p>
|
|
<p id="mrs_01_1040__p1944684185813">This section applies to MRS 3.<em id="mrs_01_1040__i467911402525">x</em> or later. For versions earlier than MRS 3.<em id="mrs_01_1040__i1729621312418">x</em>, see <a href="mrs_01_24299.html">Balancing Data After Kafka Node Scale-Out</a>.</p>
|
|
</div>
|
|
<div class="section" id="mrs_01_1040__section6225422151549"><h4 class="sectiontitle">Prerequisites</h4><ul id="mrs_01_1040__ul60396398151610"><li id="mrs_01_1040__li19456458151610">The system administrator has understood service requirements and prepared a Kafka administrator (belonging to the <strong id="mrs_01_1040__b146273501522">kafkaadmin</strong> group. It is not required for the normal mode.). </li><li id="mrs_01_1040__li9964895151620">The Kafka client has been installed.</li></ul>
|
|
</div>
|
|
<div class="section" id="mrs_01_1040__section46036500151654"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_1040__ol16690725151827"><li id="mrs_01_1040__li63919712151827"><span>Log in as a client installation user to the node on which the Kafka client is installed.</span></li><li id="mrs_01_1040__li49261531151838"><span>Switch to the Kafka client installation directory, for example, <strong id="mrs_01_1040__b1543017119535">/opt/kafkaclient</strong>.</span><p><p id="mrs_01_1040__p1430181313479"><strong id="mrs_01_1040__b1484700134720">cd /opt/kafkaclient</strong></p>
|
|
</p></li><li id="mrs_01_1040__li41350393151913"><span>Run the following command to configure environment variables:</span><p><p id="mrs_01_1040__p36609218151913"><strong id="mrs_01_1040__b10692161718521">source bigdata_env</strong></p>
|
|
</p></li><li id="mrs_01_1040__li43070926151956"><span>Run the following command to authenticate the user (skip this step in normal mode):</span><p><p id="mrs_01_1040__p39415123152019"><strong id="mrs_01_1040__b15178151785311">kinit</strong> <em id="mrs_01_1040__i1517916171530">Component service user</em></p>
|
|
</p></li><li id="mrs_01_1040__li9706746152023"><span>Run the following command to switch to the Kafka client installation directory:</span><p><p id="mrs_01_1040__p917269152045"><strong id="mrs_01_1040__b8855144875318">cd Kafka/kafka</strong></p>
|
|
</p></li><li id="mrs_01_1040__li35359307152051"><span>Run the <strong id="mrs_01_1040__b11564182135316">kafka-balancer.sh</strong> command to balance user cluster. The commonly used commands are:</span><p><ul id="mrs_01_1040__ul21332352152118"><li id="mrs_01_1040__li22214994152118">Run the <strong id="mrs_01_1040__b32511455125310">--run</strong> command to perform cluster balancing: <p id="mrs_01_1040__p26053051152143"><strong id="mrs_01_1040__b137021357155316">./bin/kafka-balancer.sh --run --zookeeper</strong> <em id="mrs_01_1040__i07085577533"><ZooKeeper</em><em id="mrs_01_1040__i171025713533"><em id="mrs_01_1040__i27096579534"> <em id="mrs_01_1040__i11709457125315">service IP address of any ZooKeeper node</em></em>:zkPort<strong id="mrs_01_1040__b37091757195316">/kafka</strong>></em> <strong id="mrs_01_1040__b1471045720535">--bootstrap-server</strong> <em id="mrs_01_1040__i07101557155315"><Kafka</em><em id="mrs_01_1040__i5711157105314"> cluster IP: port></em> <strong id="mrs_01_1040__b15991167115410">--throttle 10000000 --consumer-config config/consumer.properties --enable-az-aware --show-details</strong></p>
|
|
<p id="mrs_01_1040__p49904411152217">This command consists of generation and execution of the balancing solution. <strong id="mrs_01_1040__b9114165621">--show-details</strong> is optional, indicating whether to print the solution details. <strong id="mrs_01_1040__b329815724">--throttle</strong> indicates the bandwidth limit during the execution of the balancing solution. The unit is bytes per second (bytes/sec). <strong id="mrs_01_1040__b1233864916213">--enable-az-aware</strong> indicates that the cross-AZ feature is enabled when the balancing solution is generated. When this parameter is used, ensure that the cross-AZ feature has been enabled for the cluster.</p>
|
|
</li><li id="mrs_01_1040__li44659366152229">Run the <strong id="mrs_01_1040__b19429111010557">--run</strong> command to decommission a node:<p id="mrs_01_1040__p54301942152240"><strong id="mrs_01_1040__b6197161214555">./bin/kafka-balancer.sh --run --zookeeper </strong><em id="mrs_01_1040__i1120571265510"><<em id="mrs_01_1040__i52042012145517"><em id="mrs_01_1040__i1320341225516">Service IP address of any ZooKeeper node</em></em>:zkPort<strong id="mrs_01_1040__b14205121216554">/kafka</strong>></em> <strong id="mrs_01_1040__b820611127556">--bootstrap-server</strong> <em id="mrs_01_1040__i8207171265511"><Kafka cluster IP address: port></em> <strong id="mrs_01_1040__b220721235514">--throttle 10000000 --consumer-config config/consumer.properties --remove-brokers </strong><em id="mrs_01_1040__i3208112195519"><BrokerId list></em><strong id="mrs_01_1040__b149206418547"> --enable-az-aware --force</strong></p>
|
|
<p id="mrs_01_1040__p26885854152252">In the command, <strong id="mrs_01_1040__b285855412317">--remove-brokers</strong> indicates the list of broker IDs to be deleted. Multiple broker IDs are separated by commas (,). <strong id="mrs_01_1040__b164319151941">--force</strong> is optional, indicating that the disk usage alarm is ignored and the migration solution is forcibly generated. <strong id="mrs_01_1040__b122815344415">-enable-az-aware</strong> is optional, indicating that the cross-AZ feature is enabled when the balancing solution is generated. When this parameter is used, ensure that the cross-AZ feature has been enabled for the cluster.</p>
|
|
<div class="note" id="mrs_01_1040__note1412815525454"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_1040__p912925211456">This command migrates data on the Broker nodes to be decommissioned to other Broker nodes.</p>
|
|
</div></div>
|
|
</li><li id="mrs_01_1040__li25104541152330">Run the following command to view the execution status: <p id="mrs_01_1040__p65034562152347"><a name="mrs_01_1040__li25104541152330"></a><a name="li25104541152330"></a><strong id="mrs_01_1040__b189324586559">./bin/kafka-balancer.sh --status --zookeeper</strong> <em id="mrs_01_1040__i79411358195520"><<em id="mrs_01_1040__i3939058145514"><em id="mrs_01_1040__i9938205816557">Service IP address of any ZooKeeper node</em></em>:zkPort<strong id="mrs_01_1040__b1494035885515">/kafka</strong>></em></p>
|
|
</li><li id="mrs_01_1040__li5567054152359">Run the following command to generate a balancing solution:<p id="mrs_01_1040__p24048929152438"><a name="mrs_01_1040__li5567054152359"></a><a name="li5567054152359"></a><strong id="mrs_01_1040__b119736175720">./bin/kafka-balancer.sh --generate --zookeeper</strong> <em id="mrs_01_1040__i179801410576"><</em><em id="mrs_01_1040__i12984811572"><em id="mrs_01_1040__i4982181185712"><em id="mrs_01_1040__i169810112572">Service IP address of any ZooKeeper node</em></em>:zkPort<strong id="mrs_01_1040__b2098331115717">/kafka</strong>></em><strong id="mrs_01_1040__b398511145713"> --bootstrap-server </strong><em id="mrs_01_1040__i298720110570"><Kafka</em><em id="mrs_01_1040__i29887113576"> cluster IP address:port></em> <strong id="mrs_01_1040__b1310891755516">--consumer-config config/consumer.properties --enable-az-aware</strong></p>
|
|
<p id="mrs_01_1040__p38042917152421">This command is used to generate a migration solution based on the current cluster status and print the solution to the console. <strong id="mrs_01_1040__b1936528256">--enable-az-aware</strong> is optional, indicating that the cross-AZ feature is enabled when a migration solution is generated. If this parameter is used, ensure that the cross-AZ feature has been enabled for the cluster.</p>
|
|
</li><li id="mrs_01_1040__li62064580152451">Clearing the intermediate status<p id="mrs_01_1040__p44916214152512"><a name="mrs_01_1040__li62064580152451"></a><a name="li62064580152451"></a><strong id="mrs_01_1040__b1636714388582">./bin/kafka-balancer.sh --clean --zookeeper </strong><em id="mrs_01_1040__i5372438105810"><</em><em id="mrs_01_1040__i1437413816586"><em id="mrs_01_1040__i17373638135812"><em id="mrs_01_1040__i163731738125816">Service IP address of any ZooKeeper node</em></em>:zkPort<strong id="mrs_01_1040__b1337316383586">/kafka</strong>></em></p>
|
|
<p id="mrs_01_1040__p37417771152514">This command is used to clear the intermediate status information on the ZooKeeper when the migration is not complete.</p>
|
|
<div class="notice" id="mrs_01_1040__note59683659155046"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="mrs_01_1040__p282025155046">The port number of the Kafka cluster's IP address is 21007 in security mode and 9092 in normal mode.</p>
|
|
</div></div>
|
|
</li></ul>
|
|
</p></li></ol>
|
|
</div>
|
|
<div class="section" id="mrs_01_1040__section17391780194723"><h4 class="sectiontitle">Troubleshooting</h4><p id="mrs_01_1040__p17143722194752">During partition migration using the Kafka balancing tool, if the execution progress of the balancing tool is blocked due to a Broker fault in the cluster, you need to manually rectify the fault. The scenarios are as follows:</p>
|
|
<ul id="mrs_01_1040__ul55348467194819"><li id="mrs_01_1040__li59315838194819">The Broker is faulty because the disk usage reaches 100%. <ol id="mrs_01_1040__ol142219291029"><li id="mrs_01_1040__li4221229429">Log in to FusionInsight Manager, choose <span id="mrs_01_1040__text481915153511"><strong id="mrs_01_1040__b147333532224">Cluster</strong> > <em id="mrs_01_1040__i4640056142213">Name of the desired cluster</em> > </span><strong id="mrs_01_1040__b115805592221">Services</strong> > <strong id="mrs_01_1040__b152635519230">Kafka</strong> > <strong id="mrs_01_1040__b1921567162317">Instance</strong>, stop the Broker instance in the <span class="parmname" id="mrs_01_1040__p33f63ce2b3484a958e31c5b02940569c"><b>Restoring</b></span> state, and record the management IP address of the node where the instance resides and the corresponding <span class="parmname" id="mrs_01_1040__p5c2c2960a99949798cc94409022a98e5"><b>broker.id</b></span>. You can click the role name to view the value, on the <strong id="mrs_01_1040__b3992183810247">Instance Configurations</strong> page, select <strong id="mrs_01_1040__b11992143192415">All Configurations</strong> and search for the <strong id="mrs_01_1040__b12280204782410">broker.id</strong> parameter.</li><li id="mrs_01_1040__li125161232924">Log in to the recorded management IP address as user <strong id="mrs_01_1040__b17901841258">root</strong>, and run the <strong id="mrs_01_1040__b1536912815256">df -lh</strong> command to view the mounted directory whose disk usage is 100%, for example, <span class="filepath" id="mrs_01_1040__fd6e5c46006c6425a9ef69424ac924865"><b>${BIGDATA_DATA_HOME}/kafka/data1</b></span>.</li><li id="mrs_01_1040__li13852492213">Go to the directory, run the <strong id="mrs_01_1040__ac97bb553a3564a87b2f5e33e5a1eab2e">du -sh *</strong> command to view the size of each file in the directory, Check whether files other than files in the <span class="filepath" id="mrs_01_1040__fb86d6e2e66a4462f86f2aba9c9e1c489"><b>kafka-logs</b></span> directory exist, and determine whether these files can be deleted or migrated.<ul id="mrs_01_1040__u38bab1deb4d34a1386f090d17c515b08"><li id="mrs_01_1040__l2999d98d179d42e39d30bb8b3d0eaf21">If yes, delete or migrate the related data and go to <a href="#mrs_01_1040__li286010416517">8</a>.</li><li id="mrs_01_1040__l2b42313b30db4597ad54eac0b6f4f555">If no, go to <a href="#mrs_01_1040__li207716388315">4</a>.</li></ul>
|
|
</li><li id="mrs_01_1040__li207716388315"><a name="mrs_01_1040__li207716388315"></a><a name="li207716388315"></a>Go to the <span class="filepath" id="mrs_01_1040__fa553ac2240d34e0fb4b8115ab027e7af"><b>kafka-logs</b></span> directory, run the <strong id="mrs_01_1040__ae5f45cb84f804e49bd48e0d444c22785">du -sh *</strong> command, select a partition folder to be moved. The naming rule is <span class="filepath" id="mrs_01_1040__f836da7286d76482f84407ed15d2da359"><b>Topic name-Partition ID</b></span>. Record the topic and partition.</li><li id="mrs_01_1040__l847204e787034666b0ffc45eaaaf2cd4"><a name="mrs_01_1040__l847204e787034666b0ffc45eaaaf2cd4"></a><a name="l847204e787034666b0ffc45eaaaf2cd4"></a>Modify the <span class="filepath" id="mrs_01_1040__f19babbcb4faa450db075e9284e7042bc"><b>recovery-point-offset-checkpoint</b></span> and <span class="filepath" id="mrs_01_1040__f6f04aaff07af4f8db46c1d3b2486a87e"><b>replication-offset-checkpoint</b></span> files in the <span class="filepath" id="mrs_01_1040__f5c6cd8dccdd44d6aaf5ea0bb41346118"><b>kafka-logs</b></span> directory in the same way.<ol type="a" id="mrs_01_1040__ol98321951193410"><li id="mrs_01_1040__li1832145116344">Decrease the number in the second line in the file. (To remove multiple directories, the number deducted is equal to the number of files to be removed.</li><li id="mrs_01_1040__li168327516345">Delete the line of the to-be-removed partition. (The line structure is "<em id="mrs_01_1040__i1302155462810">Topic name Partition ID Offset</em>". Save the data before deletion. Subsequently, the content must be added to the file of the same name in the destination directory.)</li></ol>
|
|
</li><li id="mrs_01_1040__li173751231545">Modify the <span class="filepath" id="mrs_01_1040__f26636e30e7a9415e8b2306ec6e21c47d"><b>recovery-point-offset-checkpoint</b></span> and <span class="filepath" id="mrs_01_1040__fd7350e627c324605b5491778db40731f"><b>replication-offset-checkpoint</b></span> files in the destination data directory (for example, <span class="filepath" id="mrs_01_1040__f2663cc09fbc249c0b94cfbb4904dd320"><b>${BIGDATA_DATA_HOME}/kafka/data2/kafka-logs</b></span>) in the same way.<ul id="mrs_01_1040__u03fe2aaee97f44f2a26fccb751250335"><li id="mrs_01_1040__l5c20275731a542b2a78efe8e2ccab97b">Increase the number in the second line in the file. (To move multiple directories, the number added is equal to the number of files to be moved.</li><li id="mrs_01_1040__l97f01ca2c58340de8ac52a1f43f3e69b">Add the to-be moved partition to the end of the file. (The line structure is "<em id="mrs_01_1040__i1622120150304">Topic name Partition ID Offset</em>". You can copy the line data saved in <a href="#mrs_01_1040__l847204e787034666b0ffc45eaaaf2cd4">5</a>.)</li></ul>
|
|
</li><li id="mrs_01_1040__li878391652">Move the partition to the destination directory. After the partition is moved, run the <strong id="mrs_01_1040__a9a2a813953da48659fc0584f5b92d878">chown omm:wheel -R</strong> <i><span class="varname" id="mrs_01_1040__v93909bc51a2b4238a1023df50cb8df19">Partition directory</span></i> command to modify the directory owner group for the partition. </li><li id="mrs_01_1040__li286010416517"><a name="mrs_01_1040__li286010416517"></a><a name="li286010416517"></a>Log in to FusionInsight Manager and choose <span id="mrs_01_1040__text1675711468612"><strong id="mrs_01_1040__b955216197346">Cluster</strong> > <em id="mrs_01_1040__i4391122293420">Name of the desired cluster</em> > </span><strong id="mrs_01_1040__b10776296349">Services</strong> > <strong id="mrs_01_1040__b178451432173410">Kafka</strong> > <strong id="mrs_01_1040__b29501034103416">Instance</strong> to start the stopped Broker instance.</li><li id="mrs_01_1040__l534eeb8b9a5f4ae281144676bf7ddf50">Wait for 5 to 10 minutes and check whether the health status of the Broker instance is <span class="parmname" id="mrs_01_1040__p4748a28faa1942ea809afe81c98d09ad"><b>Good</b></span>.<ul id="mrs_01_1040__u3d9bcc0a1c324d99ab4b7a8bf6225085"><li id="mrs_01_1040__l94552b6bada14f7186068e915bf49de1">If yes, resolve the disk capacity insufficiency problem according to the handling method of "ALM-38001 Insufficient Kafka Disk Capacity" after the alarm is cleared.</li><li id="mrs_01_1040__l3e9755cd98ce4da0833f8d6de1787bfd">If no, contact O&M support.</li></ul>
|
|
</li></ol>
|
|
<p id="mrs_01_1040__p20578345195914">After the faulty Broker is recovered, the blocked balancing task continues. You can run the <strong id="mrs_01_1040__b18954105103615">--status</strong> command to view the task execution progress. </p>
|
|
</li><li id="mrs_01_1040__li50211928195118">The Broker fault occurs because of other causes, the fault scenario is clear, and the fault can be rectified within a short period of time. <ol id="mrs_01_1040__ol5006345195151"><li id="mrs_01_1040__li46509375195151">Restore the faulty Broker according to the root cause.</li><li id="mrs_01_1040__li6683901419520">After the faulty Broker is recovered, the blocked balancing task continues. You can run the <strong id="mrs_01_1040__b1481710415018">--status</strong> command to view the task execution progress. </li></ol>
|
|
</li><li id="mrs_01_1040__li46202717195227">The Broker fault occurs because of other causes, the fault scenario is complex, and the fault cannot be rectified within a short period of time. <ol id="mrs_01_1040__ol19330120195249"><li id="mrs_01_1040__li8595299195249">Run the <strong id="mrs_01_1040__b6871144718019">kinit</strong> <em id="mrs_01_1040__i6876204711020">Kafka</em><em id="mrs_01_1040__i138779470010"> administrator account</em> command (skip this step in normal mode).</li><li id="mrs_01_1040__li64934210195257">Run the <strong id="mrs_01_1040__b406531807">zkCli.sh -server</strong> <strong id="mrs_01_1040__b25155314016"><</strong><em id="mrs_01_1040__i7619531902">ZooKeeper cluster service IP address</em>:<em id="mrs_01_1040__i9725314014">zkPort</em><strong id="mrs_01_1040__b176539016">/kafka</strong><strong id="mrs_01_1040__b7865311017">></strong> command to log in to ZooKeeper Shell.</li><li id="mrs_01_1040__li53335126195317">Run the <strong id="mrs_01_1040__b9254991918">addauth krbgroup</strong> command (skip this step in normal mode).</li><li id="mrs_01_1040__li41486537195323">Delete the <span class="filepath" id="mrs_01_1040__filepath131481048165714"><b>/admin/reassign_partitions</b></span> and <span class="filepath" id="mrs_01_1040__filepath1837165016577"><b>/controller</b></span> directories.</li><li id="mrs_01_1040__li7563278195332">Perform the preceding steps to forcibly stop the migration. After the cluster recovers, run the <strong id="mrs_01_1040__b1071183716117">kafka-reassign-partitions.sh</strong> command to delete redundant copies generated during the intermediate process. </li></ol>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0375.html">Using Kafka</a></div>
|
|
</div>
|
|
</div>
|
|
|