If a training job failed, error message "No such file or directory" is displayed in logs.
If a training input path is unreachable, error message "No such file or directory" is displayed.
If a training boot file is unavailable, error message "No such file or directory" is displayed.
When using ModelArts, store data in an OBS bucket. However, the OBS path cannot be used to read data during the execution of the training code.
The reason is as follows:
After a training job is created, the training performance is poor if the running container is directly connected to OBS. To prevent this issue, the system automatically downloads the training data to the local path of the running container. Therefore, an error occurs if an OBS path is used in training code. For example, if the OBS path to the training code is obs://bucket-A/training/, the training code will be automatically downloaded to ${MA_JOB_DIR}/training/.
For example, the OBS path to the training code is obs://bucket-A/XXX/{training-project}/, where {training-project} is the name of the folder where the training code is stored. During training, the system will automatically download the data from OBS {training-project} to the local path of the training container ($MA_JOB_DIR/{training-project}/).
If the affected path is a path to the training data, perform the following operations to resolve this issue (see "input and output configurations" for details):
The code developed locally needs to be uploaded to the ModelArts backend. It is likely to incorrectly set the path to a dependency file in training code.
You are suggested to use the following general solution to obtain the absolute path to a dependency file through the OS API.
Example:
|---project_root # Root directory for code |---BootfileDirectory # Directory where the boot file is located |---bootfile.py # Boot file |---otherfileDirectory # Directory where other dependency files are located |---otherfile.py # Other dependency files
Do as follows to obtain the path to a dependency file, otherfile_path in this example, in the boot file:
import os current_path = os.path.dirname(os.path.realpath(__file__)) # Directory where the boot file is located project_root = os.path.dirname(current_path) # Root directory of the project, which is the code directory set on the ModelArts training console otherfile_path = os.path.join(project_root, "otherfileDirectory", "otherfile.py")
Take OBS path obs://obs-bucket/training-test/demo-code as an example. The training code in this path will be automatically downloaded to ${MA_JOB_DIR}/demo-code in the training container, where demo-code is the last-level directory of the OBS path and can be customized.
If you use a custom image to create a training job, the system will automatically run the image boot command after the code directory is downloaded. The boot command must comply with the following rules:
Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.