forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
5.2 KiB
5.2 KiB
Single-Table Concurrent Write
Hudi Single-Table Concurrent Write Solution
- Uses an external service (ZooKeeper or Hive MetaStore) as the distributed mutex lock service.
- Files can be concurrently written, but commits cannot be concurrent. The commit operation is encapsulated in a transaction.
- When the commit operation is performed, the system performs conflict check. If the modified file list in the current commit operation overlaps with the file list in the commit operation after the instance time, the commit operation fails and the write operation is invalid.
Precautions for Using the Concurrency Mechanism
- For insert and bulk_insert operations, the current Hudi concurrency mechanism cannot ensure that the primary key of the table is unique after data is written. You need to ensure that the primary key is unique.
- For incremental queries, data consumption and checkpoints may be out of order. As a result, multiple concurrent write operations are completed at different time points.
- Concurrent write is supported only after this feature is enabled.
How to Use the Concurrency Mechanism
- Enable the concurrent write mechanism.
hoodie.write.concurrency.mode=optimistic_concurrency_control
hoodie.cleaner.policy.failed.writes=LAZY
- Sets the concurrent lock mode.
hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider
hoodie.write.lock.hivemetastore.database=<database_name>
hoodie.write.lock.hivemetastore.table=<table_name>
ZooKeeper:
hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider
hoodie.write.lock.zookeeper.url=<zookeeper_url>
hoodie.write.lock.zookeeper.port=<zookeeper_port>
hoodie.write.lock.zookeeper.lock_key=<table_name>
hoodie.write.lock.zookeeper.base_path=<table_path>
For details about more parameters, see Configuration Reference.
Parent topic: Data Management and Maintenance