Yang, Tong 3f5759eed2 MRS comp-lts 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2023-01-19 17:08:45 +00:00

119 lines
17 KiB
HTML

<a name="mrs_01_1065"></a><a name="mrs_01_1065"></a>
<h1 class="topictitle1">Typical Scenario: Collecting Logs from Kafka and Uploading Them to HDFS</h1>
<div id="body8662426"><div class="section" id="mrs_01_1065__sa06487045b2a46959003725608759034"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1065__ad2f1b0536d754b7985d8044694d2b0cd">This section describes how to use Flume client to collect logs from the Topic list (test1) of Kafka and save them to the <span class="filepath" id="mrs_01_1065__filepath188428386955739"><b>/flume/test</b></span> directory on HDFS.</p>
<div class="note" id="mrs_01_1065__nc300101343a147f6a176ea799c4a4c57"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p class="text" id="mrs_01_1065__acb5b871afc9c48c69dd7f66475dbba5d">By default, the cluster network environment is secure and the SSL authentication is not enabled during the data transmission process. For details about how to use the encryption mode, see <a href="mrs_01_1069.html">Configuring the Encrypted Transmission</a>. The configuration applies to scenarios where only the Flume is configured, for example, Kafka Source+Memory Channel+HDFS Sink.</p>
</div></div>
</div>
<div class="section" id="mrs_01_1065__s04430e3bb5244658b9b6a46f0f0fd94b"><h4 class="sectiontitle">Prerequisites</h4><ul id="mrs_01_1065__u1feede73da194b919e884e199fe992cb"><li id="mrs_01_1065__l1b1790aa95f0401ba4a902cd2b115feb">The cluster, HDFS, Kafka, and Flume service have been installed.</li><li id="mrs_01_1065__li1546131633911">The Flume client has been installed. For details about how to install the client, see <a href="mrs_01_1595.html">Installing the Flume Client on Clusters</a>.</li><li id="mrs_01_1065__l1e83c3cfd1ab42de873e171eb6faffab">The network environment of the cluster is secure.</li><li id="mrs_01_1065__l41ce6de757c84926a5de5cb39ebf1c54">You have created user <strong id="mrs_01_1065__b9306122217568">flume_hdfs</strong> and authorized the HDFS directory and data to be operated during log verification. For details, see <a href="mrs_01_1856.html">Adding a Ranger Access Permission Policy for HDFS</a>.</li></ul>
</div>
<div class="section" id="mrs_01_1065__section111726401311"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_1065__o540af6194cdd4007ae77e6d9ee5b9fdf"><li id="mrs_01_1065__ldba310a477e84e65bb39d8422f3f9664"><span>On FusionInsight Manager, choose <span class="menucascade" id="mrs_01_1065__menucascade22091716255739"><b><span class="uicontrol" id="mrs_01_1065__uicontrol210808680055739">System &gt; User</span></b></span> and choose <span class="menucascade" id="mrs_01_1065__menucascade127193121655739"><b><span class="uicontrol" id="mrs_01_1065__uicontrol77877507355739">More &gt; Download Authentication Credential</span></b></span> to download the Kerberos certificate file of user <strong id="mrs_01_1065__b195394508355739">flume_hdfs</strong> and save it to the local host.</span></li><li id="mrs_01_1065__l16a8a754e9654e03adf7ca038acb5b2b"><span>Configure the client parameters of the Flume role.</span><p><div class="p" id="mrs_01_1065__p65916261428">Use the Flume configuration tool on FusionInsight Manager to configure the Flume role client parameters and generate a configuration file.<ol type="a" id="mrs_01_1065__oa430f9283d074f839f46beb10194944a"><li id="mrs_01_1065__l815a6e6ed66f447a8bbacb4cb4abda77">Log in to FusionInsight Manager and choose<strong id="mrs_01_1065__b8259171211011"> Cluster</strong> &gt; <strong id="mrs_01_1065__b14260191211106">Services</strong>. On the page that is displayed, choose <strong id="mrs_01_1065__b32601512131014">Flume</strong>. On the displayed page, click the <strong id="mrs_01_1065__b92601512121016">Configuration Tool</strong> tab.</li><li id="mrs_01_1065__l766f1033789447cc9d824597e5687aa6">Set <strong id="mrs_01_1065__b21834960655739">Agent Name</strong> to <strong id="mrs_01_1065__b211605281855739">client</strong>. Select the source, channel, and sink to be used, drag them to the GUI on the right, and connect them.<p id="mrs_01_1065__a9cd223e3f6984c09ab85ca3280c9bd45">For example, use Kafka Source, Memory Channel, and HDFS Sink.</p>
<div class="fignone" id="mrs_01_1065__fe509bef7af8947128cea5a26ea3cd336"><span class="figcap"><b>Figure 1 </b>Example for the Flume configuration tool</span><br><span><img id="mrs_01_1065__image9296103471012" src="en-us_image_0000001295740052.png"></span></div>
</li><li id="mrs_01_1065__lf82319791edf4cd1a6af4ac3d2b17ece">Double-click the source, channel, and sink. Set corresponding configuration parameters by seeing <a href="#mrs_01_1065__t6c3b4afafa084081b9b2d9400d6ea379">Table 1</a> based on the actual environment.<div class="note" id="mrs_01_1065__n12c0807eb21342be8b372535b9673cbe"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_1065__u318b45ac8d8341578c77bb498420decc"><li id="mrs_01_1065__l0a2de2d5bf434353ba558a76d4c635ab">If you want to continue using the <strong id="mrs_01_1065__b433055013103">properties.propretites</strong> file by modifying it, log in to FusionInsight Manager, choose <strong id="mrs_01_1065__b733205021016">Cluster</strong> &gt; <em id="mrs_01_1065__i43331950181018">Name of the desired cluster</em> &gt; <strong id="mrs_01_1065__b633445016103">Services</strong>. On the page that is displayed, choose <strong id="mrs_01_1065__b5335125061012">Flume</strong>. On the displayed page, click the <strong id="mrs_01_1065__b1233535020106">Configuration Tool</strong> tab, click <strong id="mrs_01_1065__b16336175051015">Import</strong>, import the file, and modify the configuration items related to non-encrypted transmission.</li><li id="mrs_01_1065__l503aff2a2d7748ba861a9fc8eaa40c9c">It is recommended that the numbers of Sources, Channels, and Sinks do not exceed 40 during configuration file import. Otherwise, the response time may be very long.</li></ul>
</div></div>
</li><li id="mrs_01_1065__l92b924df515f493daa8ec019ca9fcec4"><a name="mrs_01_1065__l92b924df515f493daa8ec019ca9fcec4"></a><a name="l92b924df515f493daa8ec019ca9fcec4"></a>Click <strong id="mrs_01_1065__b137094315571">Export</strong> to save the <strong id="mrs_01_1065__b1437074365710">properties.properties</strong> configuration file to the local.
<div class="tablenoborder"><a name="mrs_01_1065__t6c3b4afafa084081b9b2d9400d6ea379"></a><a name="t6c3b4afafa084081b9b2d9400d6ea379"></a><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1065__t6c3b4afafa084081b9b2d9400d6ea379" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameters to be modified of the Flume role client</caption><thead align="left"><tr id="mrs_01_1065__rfdf1becea1eb4b508e4b37d2a18f6218"><th align="left" class="cellrowborder" valign="top" width="33%" id="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1"><p id="mrs_01_1065__ab3de987dff774b7b88eec93b89891c50">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33%" id="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2"><p id="mrs_01_1065__ab56e4125e19a45f3931f9bfc187235ce">Description</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="34%" id="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3"><p id="mrs_01_1065__a0b2835c613dd44f092d034fafbe5d53d">Example Value</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_1065__re79aa5231fbe478cb8032f66b24f916f"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__a81dc5e5311ee42b181173eac4bd53584">Name</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__a1610804f5a5e40f398d2ab84bc838c1b">The value must be unique and cannot be left blank.</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__a2ac9898f0f7f4966a0b1e4dab88ae4f9">test</p>
</td>
</tr>
<tr id="mrs_01_1065__r3000c2fbb2444967bbef70b628072de0"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__a7a15ac3cad774a179a6f731fa4a908a1">kafka.topics</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__a4ace6cd0e5424745b2a5d874cc6737c8">Specifies the subscribed Kafka topic list, in which topics are separated by commas (,). This parameter cannot be left blank.</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__a1e3f665a9d2040008c2a5a265aebff61">test1</p>
</td>
</tr>
<tr id="mrs_01_1065__rb996deaa92014c8795eafc15013a7fcb"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__ac3d38384e14a4a38b6811ec06ffcdd0f">kafka.consumer.group.id</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__a1d73bb95909249cc8915a7deb99211e2">Specifies the data group ID obtained from Kafka. This parameter cannot be left blank.</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__aa2e92ff7b98f4d14b85cde8503974255">flume</p>
</td>
</tr>
<tr id="mrs_01_1065__rb1ec79e8d2e34950b5debac58a790deb"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__en-us_topic_0060039168_p520422315228">kafka.bootstrap.servers</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__a347bd48a4b2e483aa2a54743ebecae2d">Specifies the bootstrap IP address and port list of Kafka. The default value is all Kafka lists in a Kafka cluster. If Kafka has been installed in the cluster and its configurations have been synchronized, this parameter can be left blank.</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__ac88cd7deac254bb5859a1e2545700d2e">192.168.101.10:9092</p>
</td>
</tr>
<tr id="mrs_01_1065__rc0ec71524ff349b18d046f11673e470f"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__a89cabb1f684e4c489a00db0bec2be4cb">batchSize</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__en-us_topic_0060039168_p702749144830">Specifies the number of events that Flume sends in a batch (number of data pieces).</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__ac25dc2cb628645f9b635f9b8170173a8">61200</p>
</td>
</tr>
<tr id="mrs_01_1065__re5b78c392e5e4e62a9f83996d983ff97"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__p1075110221124">hdfs.path</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__p11751522101216">Specifies the HDFS data write directory. This parameter cannot be left blank.</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__p875142211219">hdfs://hacluster/flume/test</p>
</td>
</tr>
<tr id="mrs_01_1065__r93040975d3bc46fe8e6d471d83b2ed18"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__p12751622121214">hdfs.inUsePrefix</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__p97511422131217">Specifies the prefix of the file that is being written to HDFS.</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__p17519226125">TMP_</p>
</td>
</tr>
<tr id="mrs_01_1065__rbbb29cd2a0e94240a7921d20f9d9afd9"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__p7751622141216">hdfs.batchSize</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__p1375115224127">Specifies the maximum number of events that can be written to HDFS once.</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__p675152291216">61200</p>
</td>
</tr>
<tr id="mrs_01_1065__r61fbec49005140828cb7c2a8e054e4f9"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__p167510224123">hdfs.kerberosPrincipal</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__p1675152271212">Specifies the Kerberos authentication user, which is mandatory in security versions. This configuration is required only in security clusters.</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__p1575214226126">flume_hdfs</p>
</td>
</tr>
<tr id="mrs_01_1065__r086cd3151a894757b5b8e8c3233ff30c"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__p1775222219124">hdfs.kerberosKeytab</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__p11752142261215">Specifies the keytab file path for Kerberos authentication, which is mandatory in security versions. This configuration is required only in security clusters.</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__p1375262211217">/opt/test/conf/user.keytab</p>
<div class="note" id="mrs_01_1065__note18752132215128"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="mrs_01_1065__p67525224125">Obtain the <strong id="mrs_01_1065__b100480318155739">user.keytab</strong> file from the Kerberos certificate file of the user <strong id="mrs_01_1065__b27405678355739">flume_hdfs</strong>. In addition, ensure that the user who installs and runs the Flume client has the read and write permissions on the <strong id="mrs_01_1065__b64857193355739">user.keytab</strong> file.</p>
</div></div>
</td>
</tr>
<tr id="mrs_01_1065__r4ed2a2c094d540c49ce8e6fc501a079f"><td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.1 "><p id="mrs_01_1065__p1875272211128">hdfs.useLocalTimeStamp</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.2 "><p id="mrs_01_1065__p675216225127">Specifies whether to use the local time. Possible values are <strong id="mrs_01_1065__b33130898155739">true</strong> and <strong id="mrs_01_1065__b182695029255739">false</strong>.</p>
</td>
<td class="cellrowborder" valign="top" width="34%" headers="mcps1.3.3.2.2.2.1.1.4.3.2.4.1.3 "><p id="mrs_01_1065__p197521722171218">true</p>
</td>
</tr>
</tbody>
</table>
</div>
</li></ol>
</div>
</p></li><li id="mrs_01_1065__laf28e4a6809f4c3c884dd080c2d2e41a"><span>Upload the configuration file.</span><p><p id="mrs_01_1065__p1771720354579">Upload the file exported in <a href="#mrs_01_1065__l92b924df515f493daa8ec019ca9fcec4">2.d</a> to the <em id="mrs_01_1065__i4924175816416">Flume client installation directory</em><strong id="mrs_01_1065__b1823910175429">/fusioninsight-flume-</strong><span id="mrs_01_1065__text0410535195415"><em id="mrs_01_1065__i174101235175418">Flume component version number</em></span><strong id="mrs_01_1065__b6984113415429">/conf</strong> directory of the cluster</p>
</p></li></ol><ol start="4" id="mrs_01_1065__ofa622de0520745d5ad045abc022b9cf6"><li id="mrs_01_1065__le7801192c8734a32a4208d6292de2d4a"><span>Verify log transmission.</span><p><ol type="a" id="mrs_01_1065__o71eb2167b0894dc0acbd61933ab793cf"><li id="mrs_01_1065__lc066c3f5cd3540ef838a483303d05abf">Log in to FusionInsight Manager as a user who has the management permission on HDFS. For details, see <a href="mrs_01_2124.html">Accessing FusionInsight Manager</a>. Choose <strong id="mrs_01_1065__b524525131113">Cluster</strong> &gt; <strong id="mrs_01_1065__b12461551111">Services</strong> &gt; <strong id="mrs_01_1065__b102467511115">HDFS</strong>. On the page that is displayed, click the <strong id="mrs_01_1065__b1324714511119">NameNode(</strong><em id="mrs_01_1065__i1024712512115">Node name</em><strong id="mrs_01_1065__b72483511115">,Active)</strong> link next to <strong id="mrs_01_1065__b192491452114">NameNode WebUI</strong> to go to the HDFS web UI. On the displayed page, choose <strong id="mrs_01_1065__b1524918517117">Utilities</strong> &gt; <strong id="mrs_01_1065__b22506541114">Browse the file system</strong>.</li><li id="mrs_01_1065__l25c000ab134c44b396bbdb8c181c0979">Check whether the data is generated in the <strong id="mrs_01_1065__b110274047455739">/flume/test</strong> directory on the HDFS.<div class="fignone" id="mrs_01_1065__ff9e5b8d0b1434650881f34b40ebd90aa"><span class="figcap"><b>Figure 2 </b>Checking HDFS directories and files</span><br><span><img id="mrs_01_1065__image47851062344" src="en-us_image_0000001349059705.png"></span></div>
</li></ol>
</p></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1059.html">Non-Encrypted Transmission</a></div>
</div>
</div>