Yang, Tong 48706b7552 MRS COMP-LTS 320-lts.1 version
Reviewed-by: Kacur, Michal <michal.kacur@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2024-04-12 12:51:10 +00:00

449 lines
37 KiB
HTML

<a name="mrs_01_24124"></a><a name="mrs_01_24124"></a>
<h1 class="topictitle1">CDL Usage Instructions</h1>
<div id="body0000001583271041"><p id="mrs_01_24124__p117431354514">CDL is a simple and efficient real-time data integration service. It captures data change events from various OLTP databases and pushes them to Kafka. The Sink Connector consumes data in topics and imports the data to the software applications of big data ecosystems. In this way, data is imported to the data lake in real time.</p>
<p id="mrs_01_24124__p1483192511189">The CDL service contains two roles: CDLConnector and CDLService. CDLConnector is the instance for executing a data capture job, and CDLService is the instance for managing and creating a job.</p>
<p id="mrs_01_24124__p5131152113187">You can create data synchronization and comparison tasks on the CDLService WebUI.</p>
<div class="section" id="mrs_01_24124__section1263405432817"><h4 class="sectiontitle">Data synchronization task</h4><ul id="mrs_01_24124__ul168103919168"><li id="mrs_01_24124__li20580153718213">The CDL supports the following types of data synchronization tasks:
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_24124__table14972537192216" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Data synchronization task types supported by the CDL</caption><thead align="left"><tr id="mrs_01_24124__row11973173712227"><th align="left" class="cellrowborder" valign="top" width="16%" id="mcps1.3.4.2.1.1.2.4.1.1"><p id="mrs_01_24124__p1497318374226">Data source</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="28.999999999999996%" id="mcps1.3.4.2.1.1.2.4.1.2"><p id="mrs_01_24124__p8973133710223">Destination end</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="55.00000000000001%" id="mcps1.3.4.2.1.1.2.4.1.3"><p id="mrs_01_24124__p29731237112216">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_24124__row7973163712229"><td class="cellrowborder" rowspan="2" valign="top" width="16%" headers="mcps1.3.4.2.1.1.2.4.1.1 "><p id="mrs_01_24124__p1154685592520">MySQL</p>
</td>
<td class="cellrowborder" valign="top" width="28.999999999999996%" headers="mcps1.3.4.2.1.1.2.4.1.2 "><p id="mrs_01_24124__p10974103712227">Hudi</p>
</td>
<td class="cellrowborder" valign="top" width="55.00000000000001%" headers="mcps1.3.4.2.1.1.2.4.1.3 "><p id="mrs_01_24124__p17974437102217">This task synchronizes data from the MySQL database to Hudi.</p>
</td>
</tr>
<tr id="mrs_01_24124__row13974193762219"><td class="cellrowborder" valign="top" headers="mcps1.3.4.2.1.1.2.4.1.1 "><p id="mrs_01_24124__p16974203762210">Kafka</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.4.2.1.1.2.4.1.2 "><p id="mrs_01_24124__p139741037102214">This task synchronizes data from the MySQL database to Kafka.</p>
</td>
</tr>
<tr id="mrs_01_24124__row17974037162217"><td class="cellrowborder" rowspan="2" valign="top" width="16%" headers="mcps1.3.4.2.1.1.2.4.1.1 "><p id="mrs_01_24124__p119745373223">PgSQL</p>
</td>
<td class="cellrowborder" valign="top" width="28.999999999999996%" headers="mcps1.3.4.2.1.1.2.4.1.2 "><p id="mrs_01_24124__p134888340295">Hudi</p>
</td>
<td class="cellrowborder" valign="top" width="55.00000000000001%" headers="mcps1.3.4.2.1.1.2.4.1.3 "><p id="mrs_01_24124__p1097443713223">This task synchronizes data from the PgSQL database to Hudi.</p>
</td>
</tr>
<tr id="mrs_01_24124__row5974163718228"><td class="cellrowborder" valign="top" headers="mcps1.3.4.2.1.1.2.4.1.1 "><p id="mrs_01_24124__p8488113492914">Kafka</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.4.2.1.1.2.4.1.2 "><p id="mrs_01_24124__p8974103722218">This task synchronizes data from the PgSQL database to Kafka.</p>
</td>
</tr>
<tr id="mrs_01_24124__row18974437202218"><td class="cellrowborder" rowspan="2" valign="top" width="16%" headers="mcps1.3.4.2.1.1.2.4.1.1 "><p id="mrs_01_24124__p19254155417537">Hudi</p>
</td>
<td class="cellrowborder" valign="top" width="28.999999999999996%" headers="mcps1.3.4.2.1.1.2.4.1.2 "><p id="mrs_01_24124__p15588174165310">DWS</p>
</td>
<td class="cellrowborder" valign="top" width="55.00000000000001%" headers="mcps1.3.4.2.1.1.2.4.1.3 "><p id="mrs_01_24124__p14975637112219">This task synchronizes data from the Hudi database to DWS.</p>
</td>
</tr>
<tr id="mrs_01_24124__row3975137172210"><td class="cellrowborder" valign="top" headers="mcps1.3.4.2.1.1.2.4.1.1 "><p id="mrs_01_24124__p145254935319">ClickHouse</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.4.2.1.1.2.4.1.2 "><p id="mrs_01_24124__p7975123710227">This task synchronizes data from the Hudi database to ClickHouse.</p>
</td>
</tr>
<tr id="mrs_01_24124__row20975113715225"><td class="cellrowborder" valign="top" width="16%" headers="mcps1.3.4.2.1.1.2.4.1.1 "><p id="mrs_01_24124__p152970472383">ThirdKafka</p>
</td>
<td class="cellrowborder" valign="top" width="28.999999999999996%" headers="mcps1.3.4.2.1.1.2.4.1.2 "><p id="mrs_01_24124__p3481181335614">Hudi</p>
</td>
<td class="cellrowborder" valign="top" width="55.00000000000001%" headers="mcps1.3.4.2.1.1.2.4.1.3 "><p id="mrs_01_24124__p18975203762216">This task synchronizes data from the ThirdKafka database to Hudi.</p>
</td>
</tr>
</tbody>
</table>
</div>
</li><li id="mrs_01_24124__li1120715316191">Usage Constraints:<ul id="mrs_01_24124__ul7551131232019"><li id="mrs_01_24124__li17430842133714">If CDL is required, the value of <strong id="mrs_01_24124__b12568138066">log.cleanup.policy</strong> of Kafka must be <strong id="mrs_01_24124__b1984615192613">delete</strong>.</li><li id="mrs_01_24124__li7681139201617">The CDL service has been installed in the MRS cluster.</li><li id="mrs_01_24124__li10707137164513">CDL can capture incremental data only from non-system tables, but not from built-in databases of databases such as MySQL, and PostgreSQL.</li><li id="mrs_01_24124__li268123915168"><a name="mrs_01_24124__li268123915168"></a><a name="li268123915168"></a>Binary logging (enabled by default) and GTID have been enabled for the MySQL database. CDL cannot fetch tables whose names contain special characters such as the dollar sign ($) character.<p id="mrs_01_24124__p4801274617"><a name="mrs_01_24124__li268123915168"></a><a name="li268123915168"></a><strong id="mrs_01_24124__b13205112215322">To check whether binary logging is enabled for the MySQL database:</strong></p>
<p id="mrs_01_24124__p68632581752">Use a tool (Navicat is used in this example) or CLI to connect to the MySQL database and run the <strong id="mrs_01_24124__b17821173811120">show variables like 'log_%'</strong> command to view the configuration.</p>
<p id="mrs_01_24124__p89979273226">For example, in Navicat, choose <strong id="mrs_01_24124__b52500488527">File</strong> &gt; <strong id="mrs_01_24124__b199814515528">New Query</strong> to create a query, enter the following SQL statement, and click <strong id="mrs_01_24124__b176742219536">Run</strong>. If <strong id="mrs_01_24124__b1917112745317">log_bin</strong> is displayed as <strong id="mrs_01_24124__b1757817291534">ON</strong> in the result, the function is enabled successfully.</p>
<p id="mrs_01_24124__p10585927151817"><strong id="mrs_01_24124__b8994531161811">show variables like 'log_%'</strong></p>
<p id="mrs_01_24124__p122405324195"><span><img id="mrs_01_24124__image1612548145615" src="en-us_image_0000001532472704.png"></span></p>
<p id="mrs_01_24124__p2381531302"><strong id="mrs_01_24124__b10182057193012">If the bin log function of the MySQL database is not enabled, perform the following operations:</strong></p>
<p id="mrs_01_24124__p16428155612311">Modify the MySQL configuration file <strong id="mrs_01_24124__b1411574716349">my.cnf</strong> (<strong id="mrs_01_24124__b647595213348">my.ini</strong> for Windows) as follows:</p>
<pre class="screen" id="mrs_01_24124__screen19348142819455">server-id = 223344
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
expire_logs_days = 10</pre>
<p id="mrs_01_24124__p9792915537">After the modification, restart MySQL for the configurations to take effect.</p>
<p id="mrs_01_24124__p4632203714247"><strong id="mrs_01_24124__b27302073318">To check whether GTID is enabled for the MySQL database:</strong></p>
<p id="mrs_01_24124__p6958164194818">Run the <strong id="mrs_01_24124__b1400437162315">show global variables like '%gtid%'</strong> command to check whether GTID is enabled. For details, see the official documentation of the corresponding MySQL version. (For details about how to enable the function in MySQL 8.x, see <a href="https://dev.mysql.com/doc/refman/8.0/en/replication-mode-change-online-enable-gtids.html" target="_blank" rel="noopener noreferrer">https://dev.mysql.com/doc/refman/8.0/en/replication-mode-change-online-enable-gtids.html</a>.)</p>
<p id="mrs_01_24124__p4428142815525"><span><img id="mrs_01_24124__image956419296523" src="en-us_image_0000001532791924.png"></span></p>
<p id="mrs_01_24124__p4500194315919"><strong id="mrs_01_24124__b188441055165516">Set user permissions:</strong></p>
<p id="mrs_01_24124__p339147367">To execute MySQL tasks, users must have the <strong id="mrs_01_24124__b11434163617568">SELECT</strong>, <strong id="mrs_01_24124__b153691381563">RELOAD</strong>, <strong id="mrs_01_24124__b2052642175616">SHOW DATABASES</strong>, <strong id="mrs_01_24124__b1237016503568">REPLICATION SLAVE</strong> and <strong id="mrs_01_24124__b491652571">REPLICATION CLIENT</strong> permissions.</p>
<p id="mrs_01_24124__p8114141232111">Run the following command to grant the permissions:</p>
<p id="mrs_01_24124__p82014419610"><strong id="mrs_01_24124__b15582173091211">GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO</strong> '<em id="mrs_01_24124__i1851314342126">Username</em>' <strong id="mrs_01_24124__b179194131219">IDENTIFIED BY</strong> '<em id="mrs_01_24124__i14196125632317">Password</em>';</p>
<p id="mrs_01_24124__p1959343920014">Run the following command to update the permissions:</p>
<p id="mrs_01_24124__p191191810171218"><strong id="mrs_01_24124__b1915905551212">FLUSH PRIVILEGES;</strong></p>
</li><li id="mrs_01_24124__li1868193914169"><a name="mrs_01_24124__li1868193914169"></a><a name="li1868193914169"></a>The write-ahead log policy is modified for the PostgreSQL database.<div class="note" id="mrs_01_24124__note295517720536"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_24124__ul395519714536"><li id="mrs_01_24124__li1395577165311">The user for connecting to the PostgreSQL database must have the replication permission, the CREATE permission on the database, and is the owner of tables.</li><li id="mrs_01_24124__li6955107155312">CDL cannot fetch tables whose names contain special characters such as the dollar sign ($) character.</li><li id="mrs_01_24124__li3955127135318">For PostgreSQL databases, you must have the permission to set the <strong id="mrs_01_24124__b966018513296">statement_timeout</strong> and <strong id="mrs_01_24124__b180715182914">lock_timeout</strong> parameters and the permission to query and delete slots and publications.</li><li id="mrs_01_24124__li5867112382712">You are advised to set <strong id="mrs_01_24124__b14970142814368">max_wal_senders</strong> to 1.5 or 2 times the value of <strong id="mrs_01_24124__b449935123619">Slot</strong>.</li><li id="mrs_01_24124__li1781498184015"><span id="mrs_01_24124__ph1121161132316">I</span>f the replication identifier of a PostgreSQL table is <strong id="mrs_01_24124__b6848224775">default</strong>, enable the full field completion function in the following scenarios:<ul id="mrs_01_24124__ul199356521427"><li id="mrs_01_24124__li11164241112217">Scenario 1:<p id="mrs_01_24124__p1369324212217"><a name="mrs_01_24124__li11164241112217"></a><a name="li11164241112217"></a>When the <strong id="mrs_01_24124__b178751047163112">delete</strong> operation is performed on the source database, a <strong id="mrs_01_24124__b8877124703112">delete</strong> event contains only the primary key information. In this case, for the <strong id="mrs_01_24124__b1087804733110">delete</strong> data written to Hudi, only the primary key has values, and the values of other service fields are <strong id="mrs_01_24124__b087994712311">null</strong>.</p>
</li><li id="mrs_01_24124__li2858144822216">Scenario 2:<p id="mrs_01_24124__p129655497225"><a name="mrs_01_24124__li2858144822216"></a><a name="li2858144822216"></a>When the size of a single piece of data in the database exceeds 8 KB (including 8 KB), an <strong id="mrs_01_24124__b159389309135">update</strong> event contains only changed fields. In this case, the values of some fields in the Hudi data are <strong id="mrs_01_24124__b14512021163514">__debezium_unavailable_value</strong>.</p>
</li></ul>
<p id="mrs_01_24124__p658972682316">The related commands are as follows:</p>
<ul id="mrs_01_24124__ul75984267233"><li id="mrs_01_24124__li19597326192311">Command for querying the replication identifier of a PostgreSQL table:<p id="mrs_01_24124__p95971226152315"><a name="mrs_01_24124__li19597326192311"></a><a name="li19597326192311"></a><strong id="mrs_01_24124__b1159752617238">SELECT CASE relreplident WHEN 'd' THEN 'default' WHEN 'n' THEN 'nothing' WHEN 'f' THEN 'full' WHEN 'i' THEN 'index' END AS replica_identity FROM pg_class WHERE oid = '</strong><em id="mrs_01_24124__i259710268230">tablename</em><strong id="mrs_01_24124__b1559711264234">'::regclass;</strong></p>
</li><li id="mrs_01_24124__li25988261239">Command for enabling the full field completion function for a table:<p id="mrs_01_24124__p4598162632320"><a name="mrs_01_24124__li25988261239"></a><a name="li25988261239"></a><strong id="mrs_01_24124__b20597192612239">ALTER TABLE</strong><em id="mrs_01_24124__i3598426142313"> tablename</em><strong id="mrs_01_24124__b14598172617234"> REPLICA IDENTITY FULL;</strong></p>
</li></ul>
</li></ul>
</div></div>
<ol id="mrs_01_24124__ol876817393330"><li id="mrs_01_24124__li1876823993318">Modify <strong id="mrs_01_24124__b164459342714">wal_level = logical</strong> in the database configuration file <strong id="mrs_01_24124__b123511048572">postgresql.conf</strong> (which is stored in the <strong id="mrs_01_24124__b73591510814">data</strong> folder in the PostgreSQL installation directory by default).<pre class="screen" id="mrs_01_24124__screen4939185861318">#------------------------------------------------
#WRITE-AHEAD LOG
#------------------------------------------------
# - Settings -
<strong id="mrs_01_24124__b547319218176">wal_level = logical </strong> # minimal, replica, or logical
# (change requires restart)
#fsync = on #flush data to disk for crash safety
...</pre>
</li><li id="mrs_01_24124__li4244742183312">Restart the database service.<pre class="screen" id="mrs_01_24124__screen16178424151018"># Stop
pg_ctl stop
# Start
pg_ctl start</pre>
</li></ol>
</li><li id="mrs_01_24124__li6127144552014"><a name="mrs_01_24124__li6127144552014"></a><a name="li6127144552014"></a>Prerequisites for the DWS database<p id="mrs_01_24124__p91288453205"><a name="mrs_01_24124__li6127144552014"></a><a name="li6127144552014"></a>Before a synchronization task is started, both the source and target tables exist and have the same table structure. The value of <strong id="mrs_01_24124__b1812874512200">ads_last_update_date</strong> in the DWS table is the current system time.</p>
</li><li id="mrs_01_24124__li347115587209"><a name="mrs_01_24124__li347115587209"></a><a name="li347115587209"></a>Prerequisites for ThirdPartyKafka<p id="mrs_01_24124__p94712058172017"><a name="mrs_01_24124__li347115587209"></a><a name="li347115587209"></a>The upper-layer source supports openGauss and OGG. Kafka topics at the source end can be consumed by Kafka in the MRS cluster.</p>
</li><li id="mrs_01_24124__li149281260215">Prerequisites for ClickHouse<p id="mrs_01_24124__p89281632115"><a name="mrs_01_24124__li149281260215"></a><a name="li149281260215"></a>You have the permissions to operate ClickHouse. For details, see <a href="mrs_01_24057.html">ClickHouse User and Permission Management</a>.</p>
</li></ul>
</li></ul>
</div>
<div class="section" id="mrs_01_24124__section7530162582911"><h4 class="sectiontitle">Data Types and Mapping Supported by CDL Synchronization Tasks</h4><p id="mrs_01_24124__p1760715415313">This section describes the data types supported by CDL synchronization tasks and the mapping between data types of the source database and Spark data types.</p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_24124__table14711713183217" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Mapping between PostgreSQL and Spark data types</caption><thead align="left"><tr id="mrs_01_24124__row18718135324"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.3.2.3.1.1"><p id="mrs_01_24124__p13711013103211">PostgreSQL Data Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.3.2.3.1.2"><p id="mrs_01_24124__p7711613113214">Spark (Hudi) Data Type</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_24124__row971161312327"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p971013173217">int2</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p1771141312329">int</p>
</td>
</tr>
<tr id="mrs_01_24124__row117171319329"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p271151313212">int4</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p47114131327">int</p>
</td>
</tr>
<tr id="mrs_01_24124__row77101343215"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p187241313326">int8</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p18721313123213">bigint</p>
</td>
</tr>
<tr id="mrs_01_24124__row137216136322"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p127291314325">numeric(p, s)</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p1721213153215">decimal[p,s]</p>
</td>
</tr>
<tr id="mrs_01_24124__row197577505344"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p875819506346">bool</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p2075820502345">boolean</p>
</td>
</tr>
<tr id="mrs_01_24124__row49078015354"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p2090750143516">char</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p590717013512">string</p>
</td>
</tr>
<tr id="mrs_01_24124__row143561821173519"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p16356192111357">varchar</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p10356132183517">string</p>
</td>
</tr>
<tr id="mrs_01_24124__row15400152883519"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p840092863513">text</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p12400628103518">string</p>
</td>
</tr>
<tr id="mrs_01_24124__row8612422164213"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p1761211225421">timestamptz</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p18612182212425">timestamp</p>
</td>
</tr>
<tr id="mrs_01_24124__row73495263424"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p10349142634218">timestamp</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p122892418449">timestamp</p>
</td>
</tr>
<tr id="mrs_01_24124__row20189123094216"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p418993024217">date</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p01891130174216">date</p>
</td>
</tr>
<tr id="mrs_01_24124__row118903444211"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p15897340423">json, jsonb</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p48923413421">string</p>
</td>
</tr>
<tr id="mrs_01_24124__row116213382427"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p146263884217">float4</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p862103816429">float</p>
</td>
</tr>
<tr id="mrs_01_24124__row7324117142611"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.1 "><p id="mrs_01_24124__p232419713265">float8</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.3.2.3.1.2 "><p id="mrs_01_24124__p132410702619">double</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_24124__table19624134354411" frame="border" border="1" rules="all"><caption><b>Table 3 </b>Mapping between MySQL and Spark data types</caption><thead align="left"><tr id="mrs_01_24124__row1062413439443"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.4.2.3.1.1"><p id="mrs_01_24124__p8624104394417">MySQL Data Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.4.2.3.1.2"><p id="mrs_01_24124__p17562124034813">Spark (Hudi) Data Type</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_24124__row5624243194420"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p2624043184416">int</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p9524102116432">int</p>
</td>
</tr>
<tr id="mrs_01_24124__row6624174311440"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p3624343144413">integer</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p115242215438">int</p>
</td>
</tr>
<tr id="mrs_01_24124__row7624174324416"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p124951335165412">bigint</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p16524132124319">bigint</p>
</td>
</tr>
<tr id="mrs_01_24124__row3624194318443"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p962410432448">double</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p05241621114310">double</p>
</td>
</tr>
<tr id="mrs_01_24124__row46241443104414"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p1624843144413">decimal[p,s]</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p125248219439">decimal[p,s]</p>
</td>
</tr>
<tr id="mrs_01_24124__row1033215133471"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p11332191313473">varchar</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p155241521124320">string</p>
</td>
</tr>
<tr id="mrs_01_24124__row284819144713"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p1984919104711">char</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p1252412213433">string</p>
</td>
</tr>
<tr id="mrs_01_24124__row7415122224715"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p20416182264714">text</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p9524121104315">string</p>
</td>
</tr>
<tr id="mrs_01_24124__row77492377473"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p117491376474">timestamp</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p9524132124312">timestamp</p>
</td>
</tr>
<tr id="mrs_01_24124__row1990910405478"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p0909184015476">datetime</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p1652422194314">timestamp</p>
</td>
</tr>
<tr id="mrs_01_24124__row166011555474"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p6601155144715">date</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p5524192116433">date</p>
</td>
</tr>
<tr id="mrs_01_24124__row1765912514479"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p8659151144710">json</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p4524621144317">string</p>
</td>
</tr>
<tr id="mrs_01_24124__row154787314819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.1 "><p id="mrs_01_24124__p104784324815">float</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.4.2.3.1.2 "><p id="mrs_01_24124__p95241921114314">double</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_24124__table16524113363011" frame="border" border="1" rules="all"><caption><b>Table 4 </b>Mapping between Ogg and Spark data types</caption><thead align="left"><tr id="mrs_01_24124__row9525133323011"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.5.2.3.1.1"><p id="mrs_01_24124__p10701514153117">Oracle Data Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.5.2.3.1.2"><p id="mrs_01_24124__p17701114143111">Spark (Hudi) Data Type</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_24124__row1552513313305"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.1 "><p id="mrs_01_24124__p3701614163116">NUMBER(3), NUMBER(5)</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.2 "><p id="mrs_01_24124__p87017144312">bigint</p>
</td>
</tr>
<tr id="mrs_01_24124__row175257339307"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.1 "><p id="mrs_01_24124__p8701714173114">INTEGER</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.2 "><p id="mrs_01_24124__p57012144315">decimal</p>
</td>
</tr>
<tr id="mrs_01_24124__row1052553333020"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.1 "><p id="mrs_01_24124__p1470181417318">NUMBER(20)</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.2 "><p id="mrs_01_24124__p1270111433118">decimal</p>
</td>
</tr>
<tr id="mrs_01_24124__row7525333173014"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.1 "><p id="mrs_01_24124__p97016149316">NUMBER</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.2 "><p id="mrs_01_24124__p07021483113">decimal</p>
</td>
</tr>
<tr id="mrs_01_24124__row452583333012"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.1 "><p id="mrs_01_24124__p177061416318">BINARY_DOUBLE</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.2 "><p id="mrs_01_24124__p6701014173119">double</p>
</td>
</tr>
<tr id="mrs_01_24124__row14525103383019"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.1 "><p id="mrs_01_24124__p18706143314">CHAR</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.2 "><p id="mrs_01_24124__p1570714193119">string</p>
</td>
</tr>
<tr id="mrs_01_24124__row435116485306"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.1 "><p id="mrs_01_24124__p1570714103117">VARCHAR</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.2 "><p id="mrs_01_24124__p7701814193114">string</p>
</td>
</tr>
<tr id="mrs_01_24124__row2999115512309"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.1 "><p id="mrs_01_24124__p1470101483115">TIMESTAMP, DATETIME</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.2 "><p id="mrs_01_24124__p17041463118">timestamp</p>
</td>
</tr>
<tr id="mrs_01_24124__row1692310593305"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.1 "><p id="mrs_01_24124__p16701014153119">timestamp with time zone</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.2 "><p id="mrs_01_24124__p18701414113119">timestamp</p>
</td>
</tr>
<tr id="mrs_01_24124__row207813411314"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.1 "><p id="mrs_01_24124__p18704142311">DATE</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.5.2.3.1.2 "><p id="mrs_01_24124__p3237134316519">timestamp</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_24124__table87461539456" frame="border" border="1" rules="all"><caption><b>Table 5 </b>Mapping between Spark (Hudi) and DWS data types</caption><thead align="left"><tr id="mrs_01_24124__row97468354512"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.6.2.3.1.1"><p id="mrs_01_24124__p67463324517">Spark (Hudi) Data Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.6.2.3.1.2"><p id="mrs_01_24124__p774618313450">DWS Data Type</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_24124__row27461837455"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.1 "><p id="mrs_01_24124__p8746113164519">int</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.2 "><p id="mrs_01_24124__p20746193114510">int</p>
</td>
</tr>
<tr id="mrs_01_24124__row10746203184510"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.1 "><p id="mrs_01_24124__p127464394511">long</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.2 "><p id="mrs_01_24124__p1746123174517">bigint</p>
</td>
</tr>
<tr id="mrs_01_24124__row15746732455"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.1 "><p id="mrs_01_24124__p117469374518">float</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.2 "><p id="mrs_01_24124__p674673104513">float</p>
</td>
</tr>
<tr id="mrs_01_24124__row974612374513"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.1 "><p id="mrs_01_24124__p77471034457">double</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.2 "><p id="mrs_01_24124__p974715312458">double</p>
</td>
</tr>
<tr id="mrs_01_24124__row127477315457"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.1 "><p id="mrs_01_24124__p1774711384514">decimal[p,s]</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.2 "><p id="mrs_01_24124__p974718374510">decimal[p,s]</p>
</td>
</tr>
<tr id="mrs_01_24124__row1574712319457"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.1 "><p id="mrs_01_24124__p3747193104517">boolean</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.2 "><p id="mrs_01_24124__p9747133124520">boolean</p>
</td>
</tr>
<tr id="mrs_01_24124__row47473324511"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.1 "><p id="mrs_01_24124__p27471931452">string</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.2 "><p id="mrs_01_24124__p274723124517">varchar</p>
</td>
</tr>
<tr id="mrs_01_24124__row15341181032518"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.1 "><p id="mrs_01_24124__p163428102256">date</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.2 "><p id="mrs_01_24124__p2342810152515">date</p>
</td>
</tr>
<tr id="mrs_01_24124__row1295791319253"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.1 "><p id="mrs_01_24124__p89574137254">timestamp</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.6.2.3.1.2 "><p id="mrs_01_24124__p2957121312253">timestamp</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_24124__table1315715459497" frame="border" border="1" rules="all"><caption><b>Table 6 </b>Mapping between Spark (Hudi) and ClickHouse data types</caption><thead align="left"><tr id="mrs_01_24124__row19157134512498"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.7.2.3.1.1"><p id="mrs_01_24124__p10226125124915">Spark (Hudi) Data Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.5.7.2.3.1.2"><p id="mrs_01_24124__p1157124516491">ClickHouse Data Type</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_24124__row1215715457492"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.1 "><p id="mrs_01_24124__p14969544162611">int</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.2 "><p id="mrs_01_24124__p161571445134915">Int32</p>
</td>
</tr>
<tr id="mrs_01_24124__row18157745184915"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.1 "><p id="mrs_01_24124__p496914441269">long</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.2 "><p id="mrs_01_24124__p14157144514498">Int64 (bigint)</p>
</td>
</tr>
<tr id="mrs_01_24124__row12157114564912"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.1 "><p id="mrs_01_24124__p14969184418262">float</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.2 "><p id="mrs_01_24124__p121575458499">Float32 (float)</p>
</td>
</tr>
<tr id="mrs_01_24124__row10157174510497"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.1 "><p id="mrs_01_24124__p189691144132614">double</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.2 "><p id="mrs_01_24124__p18157184544911">Float64 (double)</p>
</td>
</tr>
<tr id="mrs_01_24124__row8822103015261"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.1 "><p id="mrs_01_24124__p17969444102613">decimal[p,s]</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.2 "><p id="mrs_01_24124__p18221030192612">Decimal(P,S)</p>
</td>
</tr>
<tr id="mrs_01_24124__row615804516495"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.1 "><p id="mrs_01_24124__p1296934410262">boolean</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.2 "><p id="mrs_01_24124__p18158114564917">bool</p>
</td>
</tr>
<tr id="mrs_01_24124__row75861834152615"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.1 "><p id="mrs_01_24124__p696919444269">string</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.2 "><p id="mrs_01_24124__p2586193492615">String (LONGTEXT, MEDIUMTEXT, TINYTEXT, TEXT, LONGBLOB, MEDIUMBLOB, TINYBLOB, BLOB, VARCHAR, CHAR)</p>
</td>
</tr>
<tr id="mrs_01_24124__row18158164554916"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.1 "><p id="mrs_01_24124__p1197014441264">date</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.2 "><p id="mrs_01_24124__p101581745144915">Date</p>
</td>
</tr>
<tr id="mrs_01_24124__row1666019263263"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.1 "><p id="mrs_01_24124__p1697004482615">timestamp</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.5.7.2.3.1.2 "><p id="mrs_01_24124__p8660122613267">DateTime</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="mrs_01_24124__section84641320114317"><h4 class="sectiontitle">Data comparison task</h4><p id="mrs_01_24124__p890415519911">Data comparison checks the consistency between data in the source database and that in the target Hive. If the data is inconsistent, CDL can attempt to repair the inconsistent data. For detail, see <a href="mrs_01_24775.html">Creating a CDL Data Comparison Job</a>.</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_24123.html">Using CDL</a></div>
</div>
</div>