"from-config-values": { "configs": [ { "inputs": [ { "name": "fromJobConfig.inputDirectory", "value": "/hdfsfrom/from_hdfs_est.csv" }, { "name": "fromJobConfig.inputFormat", "value": "CSV_FILE" }, { "name": "fromJobConfig.columnList", "value": "1" }, { "name": "fromJobConfig.fieldSeparator", "value": "," }, { "name": "fromJobConfig.quoteChar", "value": "false" }, { "name": "fromJobConfig.regexSeparator", "value": "false" }, { "name": "fromJobConfig.firstRowAsHeader", "value": "false" }, { "name": "fromJobConfig.encodeType", "value": "UTF-8" }, { "name": "fromJobConfig.fromCompression", "value": "NONE" }, { "name": "fromJobConfig.compressedFileSuffix", "value": "*" }, { "name": "fromJobConfig.splitType", "value": "FILE" }, { "name": "fromJobConfig.useMarkerFile", "value": "false" }, { "name": "fromJobConfig.fileSeparator", "value": "|" }, { "name": "fromJobConfig.filterType", "value": "NONE" } ], "name": "fromJobConfig" } ] }
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
fromJobConfig.inputDirectory |
Yes |
String |
Path for storing data to be extracted. For example, /data_dir. |
fromJobConfig.inputFormat |
Yes |
Enumeration |
File format required for data transmission. Currently, the following file formats are supported:
If you select BINARY_FILE, the migration destination must also be a file system. |
fromJobConfig.columnList |
No |
String |
Numbers of columns to be extracted. Use & to separate column numbers in ascending order. For example, 1&3&5. |
fromJobConfig.lineSeparator |
No |
String |
Lind feed character in a file. By default, the system automatically identifies \\n, \\r, and \\r\\n. You can configure special characters. For spaces and carriage returns, encode them with URL. You can also configure them by editing the job JSON, in which case URL encoding is not required. |
fromJobConfig.fieldSeparator |
No |
String |
Field delimiter. This parameter is valid only when the file format is CSV_FILE. The default value is ,. |
fromJobConfig.quoteChar |
No |
Boolean |
Whether to use the encircling symbol. If this parameter is set to true, the field delimiters in the encircling symbol are regarded as a part of the string value. Currently, the default encircling symbol of CDM is double quotation mark ("). |
fromJobConfig.regexSeparator |
No |
Boolean |
Whether to use the regular expression to separate fields. This parameter is valid only when the file format is CSV_FILE. |
fromJobConfig.encodeType |
No |
String |
Encoding type. For example, UTF_8 or GBK. |
fromJobConfig.firstRowAsHeader |
No |
Boolean |
Whether to regard the first line as the heading line. This parameter is valid only when the file format is CSV_FILE. When you migrate a CSV file to a table, CDM writes all data to the table by default. If this parameter is set to true, CDM uses the first line of the CSV file as the heading line and does not write the line to the destination table. |
fromJobConfig.fromCompression |
No |
Enumeration |
Compression format. Only the source files in specified compression format are transferred. NONE indicates files in all formats are transferred. |
fromJobConfig.compressedFileSuffix |
No |
String |
Extension of the files to be decompressed. The decompression operation is performed only when the file name extension is used in a batch of files. Otherwise, files are transferred in the original format. If you enter * or leave the parameter blank, all files are decompressed. |
fromJobConfig.splitType |
No |
Enumeration |
Whether to split files by file or size. If HDFS files are split, each shard is regarded as a file.
|
fromJobConfig.useMarkerFile |
No |
Boolean |
Whether to start a job by a marker file. A job is started only when a marker file for starting the job exists in the source path. Otherwise, the job will be suspended for a period of time specified by fromJobConfig.waitTime. |
fromJobConfig.markerFile |
No |
String |
Name of the marker file for starting a job. After a marker file is specified, the task is executed only when the file exists in the source path. If the marker file is not specified, this function is disabled by default. For example, ok.txt. |
fromJobConfig.fileSeparator |
No |
String |
File separator. If you enter multiple file paths in fromJobConfig.inputDirectory, CDM uses the file separator to separate files. The default value is |. |
fromJobConfig.filterType |
No |
Enumeration |
Filter type. Possible values are as follows:
|
fromJobConfig.pathFilter |
No |
String |
Path filter, which is configured when the filter type is WILDCARD. It is used to filter the file directories. For example, *input. |
fromJobConfig.fileFilter |
No |
String |
File filter, which is configured when the filter type is WILDCARD. It is used to filter files in the specified directory. Use commas (,) to separate multiple files. For example, *.csv,*.txt. |
fromJobConfig.startTime |
No |
String |
If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified at or after the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss. This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss,-90,DAY))} indicates that only files generated within the latest 90 days are migrated. |
fromJobConfig.endTime |
No |
String |
If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified before the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss. This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss))} indicates that only the files whose modification time is earlier than the current time are migrated. |
fromJobConfig.createSnapshot |
No |
Boolean |
If this parameter is set to true, CDM creates a snapshot for the source directory to be migrated (the snapshot cannot be created for a single file) before it reads files from HDFS. Then CDM migrates the data in the snapshot. Only the HDFS administrator can create a snapshot. After the CDM job is completed, the snapshot is deleted. |
fromJobConfig.formats |
No |
Data structure |
Time format. This parameter is mandatory only when fromJobConfig.inputFormat is set to CSV_FILE and the time field exists in the file. For details, see Description of the fromJobConfig.formats parameter. |
fromJobConfig.decryption |
No |
Enumeration |
This parameter is available only when fromJobConfig.inputFormat is set to BINARY_FILE. It specifies whether to decrypt the encrypted file before export, and the decryption method. The options are as follows:
|
fromJobConfig.dek |
No |
String |
Data decryption key. The key is a string of 64-bit hexadecimal numbers and must be the same as the data encryption key toJobConfig.dek configured during encryption. If the encryption and decryption keys are inconsistent, the system does not report an exception, but the decrypted data is incorrect. |
fromJobConfig.iv |
No |
String |
Initialization vector required for decryption. The initialization vector is a string of 32-bit hexadecimal numbers and must be the same as the initialization vector toJobConfig.iv configured during encryption. If the encryption and decryption keys are inconsistent, the system does not report an exception, but the decrypted data is incorrect. |