=========================================================================== Migrate a MySQL/MariaDB database to Open Telekom Cloud RDS with Apache NiFi =========================================================================== `Apache NiFi `__ (**nye-fye**) is a project from the Apache Software Foundation aiming to automate the flow of data between systems in form of workflows. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. We are going to utilize these features in order to consolidate all the manual, mundane and error prone steps of an on-premises MySQL/MariaDb database migration to Open Telekom Cloud RDS in a form of a highly scalable workflow. .. note:: Historically, NiFi is based on the "*NiagaraFiles*", a system that was developed by the US National Security Agency (NSA). It was open-sourced as a part of NSA's technology transfer program in 2014. Overview ======== With zero cost in 3rd party components and in less than 15 minutes we are going to transform a highly error prone and demanding use-case, as the migration of an MySQL or MariaDB to the cloud, to a fully automated, repeatable and scalable procedure. .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/placeholder-image.jpg .. image:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20221004-etc.png :target: https://docs.otc.t-systems.com/en-us/usermanual/rds/en-us_topic_dashboard.html :alt: Relational Database Service (RDS) .. image:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20221004-etn.png :target: https://docs.otc.t-systems.com/en-us/usermanual/rds/en-us_topic_dashboard.html .. image:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20221004-f0m.png :target: https://docs.otc.t-systems.com/en-us/usermanual/rds/en-us_topic_dashboard.html .. image:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20221004-f0q.png :target: https://docs.otc.t-systems.com/en-us/usermanual/rds/en-us_topic_dashboard.html .. image:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20221004-f0u.png :target: https://docs.otc.t-systems.com/en-us/usermanual/rds/en-us_topic_dashboard.html .. seealso:: `Github Repository `_ `Apache Nifi Workflow Template `_ Provision a MySQL instance in RDS ================================== If you don't have an RDS instance in place, let's create one in order to demonstrace this use-case. Under Relational Database Service in Open Telekom Cloud Console, .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20221004-fw6.png choose *Create DB Instance* and go through the creation wizard: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220926-9b5.png 1. Choose the basic details of your database engine. You need to stick for this use-case to MySQL engine v8.0. Whether you create a single instance database or a replicated one with primary and standby instances is fairly irrelevant in regards to our use-case. .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220926-9c5.png 2. Create a new Security Group that will allow port 3306 in its inbound ports, and assign this Security Group as Security Group of the ECS instances of your database (still in the database creation wizard) .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220926-9d4.png .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220926-9h7.png 3. After the database is successfully created, enable SSL support: .. warning:: It's not recommended transfering production data without having SSL enalbed Provision an Apache Nifi Server ================================ We are going to deploy the Apache NiFi server as a **docker container** using the following command (replace first the required credentials with the ones of your choice): .. code-block:: shell docker run --name nifi \ -p 8443:8443 \ -d \ -e SINGLE_USER_CREDENTIALS_USERNAME={{USERNAME}} \ -e SINGLE_USER_CREDENTIALS_PASSWORD={{PASSWORD}} \ apache/nifi:latest and then open your browser and navigate to the following URL address: .. code-block:: shell https://localhost:8443/nifi/ enter your credentials and you will land on an empty workflow canvas: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220810-lt4.png Create the migration workflow ============================= 1. Add a **Processor** of type **GenerateFlowFile**, as the entry point of our workflow (as is instructed in the following picture): .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220810-lvz.png 2. Add a **Processor** of type **ExecuteStreamCommand**, as the step that will dump and export our source database — and call it ExportMysqlDump: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220810-m0k.png and let’s configure the external command we want this component to execute: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220810-m2m.png go to **Properties** from the tab menu: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220810-m44.png As **Command Path** set : .. code-block:: shell /usr/bin/mysqldump and as **Command Arguments** fill in the mysql-client arguments, but separated by a semicolon (replace the highlighted values with your own): .. code-block:: shell -u;root;-P;3306;-h;{{HOSTNAME_OR_CONTAINER_IP}};-p{{PASSWORD}}; --databases;employees;--routines;--triggers;--single-transaction; --order-by-primary;--gtid;--force Connect the two Processors by dragging a connector line from the first to the latter. You should be able to observe now that a **Queue** component is injected between them: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220810-m8y.png We will see later how these Queues contribute to the workflow and how we can use them to gain useful insights or debug our workflows. 3. Open Telekom Cloud RDS for MySql will **not** permit SUPER privileges or the SET_USER_ID privilege to any user, and this will lead to the following error when you will try to run the migration workflow for the first time: .. code-block:: shell ERROR 1227 (42000) at line 295: Access denied; you need (at least one of) the SUPER or SET_USER_ID privilege(s) for this operation The error above may occur while executing CREATE VIEW, FUNCTION, PROCEDURE, TRIGGER OR EVENT with DEFINER statements as part of importing a dump file or running a script. In order to preactively mitigate this situation, we are going to add a second **Processor** of type **ExecuteStreamCommand**. This Processor (let’s call it ReplaceDefinersCommand) will edit the dump file script and replace the DEFINER values with the appropriate user with admin permissions who is going to perform the import or execute the script file. .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-ni2.png As **Command Path** set : .. code-block:: shell sed and as **Command Arguments** (*in one line*): .. code-block:: shell -e;"s/DEFINER[ ]*=[ ]*[^*]*\*/\*/"; -e;"s/DEFINER[ ]*=.*FUNCTION/FUNCTION/"; -e;"s/DEFINER[ ]*=.*PROCEDURE/PROCEDURE/"; -e;"s/DEFINER[ ]*=.*TRIGGER/TRIGGER/"; -e;"s/DEFINER[ ]*=.*EVENT/EVENT/" Connect the two ExecuteCommandStream Processors, by dragging a connector line from the first to the second. You should be able to observe now that a second Queue component is added between them on the canvas. .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-ngs.png 4. Add a third **Processor** of type **ExecuteStreamCommand** (same drill as with ExportMysqlDump). This step will import the dump to our target database — call it ImportMysqlDump. Let’s configure it: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220810-mf6.png As **Command Path** set : .. code-block:: shell /usr/bin/mysql and as **Command Arguments** (*in one line*): .. code-block:: shell -u;root;-P;3306;-h;{{EIP}};-p{{PASSWORD}};--ssl-ca;/usr/bin/ca-bundle.pem;--force Connect the ReplaceDefinersCommand with this new Processor, by dragging a connector line from the first to the second. You should be able to observe now that a second Queue component is added between them on the canvas: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-nfj.png 5. Add a **Processor** of type **LogAttribute**; this component will emit attributes of the FlowFile for a predefined log level. .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-dsr.png Then drag a connection between the ExportMysqlDump and the LogAttribute Processors, and in the Create Connection popup let’s define two new relationships: *original* and *nonzero status*. The former is the original queue message that was processed from the Processor and the latter bears the potential errors (*non zero results*) that were thrown during this step of the workflow. Every relationship will inject a dedicated queue in the workflow. Repeat the same steps for the ReplaceDefinersCommand Processor. For ImportMySqlDump and LogAttribute Processors, activate all 3 available relationship options. The output stream will log the successful results of our import workflow step. .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-dum.png Eventually, our LogAttribute Processor and its dependencies should now look like this on the canvas: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-nk1.png 6. Start the Processors. As you will notice on the left-hand upper corner of every Processor on the canvas appears a stop sign. That means that the Processors will not execute any commands even if we kick off a new instance of the workflow. In order to start them press, for every single one of them — except LogAttribute, the start button marked with blue in the picture below: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-e7c.png Configure the Apache Nifi Server ================================ At this point we are not ready yet to run our workflow. The Apache Nifi server is lacking two additional resources. The two ExecuteStreamCommand Processors will execute an export and import from and to remote MySQL instances using the mysql-client, but the Apache NiFi container doesn’t have any knowledge of this package. We have to connect to our container and install the required client. Let's connect first to the Apache Nifi container as root: .. code-block:: shell docker exec -it -u 0 nifi /bin/bash and install the client (in this case is the *mariadb-client* package): .. code-block:: shell apt-get update -y apt-get install -y mariadb-client A quick sanity check to make sure that everything is in place. For that matter go to `/usr/bin/` and make sure you that `mysqldump` and `mysql` are properly symlinked: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-eii.png Next we have to copy to the Apache Nifi container the SSL certificate we downloaded from the Open Telekom Cloud console. .. code-block:: shell docker cp ca-bundle.pem nifi:/usr/bin .. attention:: For the time being, let's skip the step above in order to simulate an error in the migration workflow and we will come back later to this. Start a Migration Workflow ========================== Open the cascading menu of the *GenerateFlowFile* component and click *Run Once*: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-f0t.png The current active Processor will be marked with this sign on right-hand upper corner on the canvas: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-f32.png Let’s see what happened and if the migration went through, and if no how could we debug and trace the source of our problem. The canvas now will be updated with some more data in every *Processor* and *Queue*: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-nsn.png *GenerateFlowFile* Processor is informing us that has sent 1 request down the pipeline (*Out* 1 — in box marked in blue). The *ExecuteMysqlDump* Processor ran successfully and wrote out a dump in the size of 160.59MB. Its logging queues show us that we have a new entry in *original* and zero entries in *nonzero status*. (The latter indicates that the Processor ran **without any error**). Let’s see what was written in the original queue. Open the queue: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-fap.png and under the *Properties* tab of the Queue, we can see which command was executed by our Processor: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-fc21.png Now let's focus on the second ExecuteStreamCommand Processor, the one that is responsible to import the dump to the target database. We can see that it received an input of 160.59MB (that is our dump file, generated from the previous Processor); it pushed it down in the *original* queue but it seems that migration didn’t go through as planned, because we have items in the *nonzero status* queue. As a first step finding the culprit, we will inspect in the original queue (open the *List Queue* and pick the element that corresponds to this very workflow instance under the *Details* tab). We can either inspect the generated dump file that was handed over by the ExportMysqlDump Processor by either viewing or download it, .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-fhz.png or inspect the command that was executed to see if there is a helpful error message (in our case there is one): .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-fhm1.png A faster way though, figuring out what went wrong, is hovering over the red sign (that will appear in case of error) in the upper right-hand corner of our Processor that threw the error: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220812-flv.png Now that we saw how we can, in principle, debug and investigate errors during the execution of our workflows, go back to previous chapter guidelines and, this time, do copy the SSL certificate to the Apache Nifi container. We are now set to start a new migration instance. You will observe that after a while the *ImportMysqlDump* Processor goes in execution mode, for the small sign on the right upper-hand corner that indicates the active threads currently running on this component. After a while, when the workflow will: * not have any more active threads in any processor * have an additional message in the outcome queue of the ImportMysqlDump Processor * have no additional messages in the nonzero status queue of the ImportMysqlDump Processor then check your database — the migration would have successfully completed: .. figure:: https://architecture-center-poc-images.obs.eu-de.otc.t-systems.com/rds-migration/SCR-20220926-bhx.png References ========== .. seealso:: `Relational Database Service: Accessing RDS `_ `Database Services Overview with RDS Deep Dive `_