Resolved Issues
|
Resolved issues in MRS 3.1.2-LTS.2.14:
- MRS Manager
- The standby OMS node reported an alarm indicating that the FMS resource was abnormal.
- Subsequent capacity expansion failed due to residual IP addresses in the HOSTS_OS_PATCH_STATE table
- CES monitoring was inconsistent with YARN monitoring.
- Active/standby OMS switchovers were frequent.
- Failed to view the host resource overview in a specified period because the monitoring data is empty
- The disk monitoring metrics were incorrectly calculated.
- Component
- Active/standby switchovers of YARN ResourceManager were frequent.
- The NodeManager health check of YARN was too sensitive.
- The YARN health check incorrectly collected the health status of standby nodes. As a result, an alarm indicating that the service is unavailable was reported.
- LDAPServer data could not be synchronized.
- Hive execution failed after the MRS 3.1.2-LTS.2.6 patch is installed
- Thread leak occurred when HiveServer connects to guardian
- Hive column values were too long and failed to be written into an ORC file.
- It took a long time to clear temporary files after Hive tasks failed or abnormally terminated.
- Hive failed to be started because external metadata has been configured for Hive.
- The /var/log/ directory was full because the hiveserver.out log of Hive is not compressed.
- It took a long time to add fields to Hive partition tables.
- The rand function should generate random numeric strings ranging from 0 to 1 but generated only strings around 0.72.
- After the WebHcat process of Hive was killed, it couldn't be automatically started and no alarm was reported.
- An exception occurred when Kafka automatically restarted after Kerberos authentication failed.
- The Spring packages in the Hudi and Spark directories were incompatible.
- After a quota was configured for ZooKeeper, an alarm indicating that the top-level quota failed to be set was still displayed.
- The client IP address needed to be printed in the logs of the old Guardian instance.
- When MemArtsCC used the TPCDS test suite to write 10 TB data, cc-sidecar restarted repeatedly during task running.
- The cc-sidecar process was faulty when the MemArtsCC bare metal server has run stably for a long time.
- Residual files needed to be quickly cleared when Spark jobs failed in the architecture with decoupled storage and compute.
- Spark printed error logs.
- The JobHistory process of Spark was suspended and did not perform self-healing, and no alarm was reported.
- The loaded partition was null when speculative execution was enabled for Spark.
- The process injection of Spark JDBCServer was in the Z state. 1. The process did not perform self-healing during fault injection. 2. No process exception alarm was generated. 3. Spark tasks failed to be submitted, and no alarm was reported for unavailable Spark applications.
- After the JDBC process of Spark was killed, self-healing was performed within 7 minutes, and no alarm was reported. There were reliability risks.
- The JDBCServer process of Spark was suspended, the process did not perform self-healing, no alarm was reported, and Spark applications failed to be submitted.
- No event was reported when Spark stopped the JDBCServer instance, and the JDBCServer.log file contained a warn indicating that the event failed to be reported.
- Some Spark jobs couldn't run due to spring package conflicts after patch 2.10 was installed
- After the Spark JobHistory process entered the z state, the process disappeared unexpectedly and did not perform self-healing. In addition, no alarm was reported, leaving reliability risks.
- After the Spark JobHistory process is killed, the process automatically recovered within 5 minutes and no alarm was reported.
- The JAR package on the Spark server was not replaced after Spark2x patch installation.
- Spark failed to write data to Eventlogs.
|