During table/file migration, CDM uses delimiters to separate fields in CSV files. However, delimiters cannot be used in complex semi-structured data because the field values also contain delimiters. In this case, the regular expression can be used to separate the fields.
The regular expression is configured in Source Job Configuration. The migration source must be an object storage or file system, and File Format must be CSV.
2018-01-11 08:50:59,001 INFO [org.apache.sqoop.core.SqoopConfiguration.configureClassLoader(SqoopConfiguration.java:251)] Adding jars to current classloader from property: org.apache.sqoop.classpath.extra
^(\d.*\d) (\w*) \[(.*)\] (\w.*).*
Column Number |
Example Value |
---|---|
1 |
2018-01-11 08:50:59,001 |
2 |
INFO |
3 |
org.apache.sqoop.core.SqoopConfiguration.configureClassLoader(SqoopConfiguration.java:251) |
4 |
Adding jars to current classloader from property: org.apache.sqoop.classpath.extra |
2018-01-11 08:51:06,156 INFO [org.apache.sqoop.audit.FileAuditLogger.logAuditEvent(FileAuditLogger.java:61)] user=sqoop.anonymous.user ip=189.xxx.xxx.75 op=show obj=version objId=x
^(\d.*\d) (\w*) \[(.*)\] user=(\w.*) ip=(\w.*) op=(\w.*) obj=(\w.*) objId=(.*).*
Column Number |
Example Value |
---|---|
1 |
2018-01-11 08:51:06,156 |
2 |
INFO |
3 |
org.apache.sqoop.audit.FileAuditLogger.logAuditEvent(FileAuditLogger.java:61) |
4 |
sqoop.anonymous.user |
5 |
189.xxx.xxx.75 |
6 |
show |
7 |
version |
8 |
x |
11-Jan-2018 09:00:06.907 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name: Linux
^(\d.*\d) (\w*) \[(.*)\] ([\w\.]*) (\w.*).*
Column Number |
Example Value |
---|---|
1 |
11-Jan-2018 09:00:06.907 |
2 |
INFO |
3 |
main |
4 |
org.apache.catalina.startup.VersionLoggerListener.log |
5 |
OS Name:Linux |
[08/Jan/2018 20:59:07 ] settings INFO Welcome to Hue 3.9.0
^\[(.*)\] (\w*) (\w*) (.*).*
Column Number |
Example Value |
---|---|
1 |
08/Jan/2018 20:59:07 |
2 |
settings |
3 |
INFO |
4 |
Welcome to Hue 3.9.0 |
[Mon Jan 08 20:43:51.854334 2018] [mpm_event:notice] [pid 36465:tid 140557517657856] AH00489: Apache/2.4.12 (Unix) OpenSSL/1.0.1t configured -- resuming normal operations
^\[(.*)\] \[(.*)\] \[(.*)\] (.*).*
Column Number |
Example Value |
---|---|
1 |
Mon Jan 08 20:43:51.854334 2018 |
2 |
mpm_event:notice |
3 |
pid 36465:tid 140557517657856 |
4 |
AH00489: Apache/2.4.12 (Unix) OpenSSL/1.0.1t configured -- resuming normal operations |