Error Message "No module named .*" Displayed in Training Job Logs

Perform the following operations to locate the fault:

  1. Checking Whether the Dependency Package Is Available
  2. Checking Whether the Dependency Package Path Can Be Detected
  3. Checking Whether the Selected Resource Flavor Is Correct
  4. Summary and Suggestions

Checking Whether the Dependency Package Is Available

If the dependency package is unavailable, use either of the following methods to install it:

In method 1, the dependency package can be downloaded and installed before the training job is started. In method 2, the dependency package is downloaded and installed during the running of the boot file.

Checking Whether the Dependency Package Path Can Be Detected

Before executing code locally, add project_dir to PYTHONPATH or install project_dir in site-package. ModelArts enables you to add project_dir to sys.path to resolve this issue.

Run from module_dir import module_file to import a package. The code structure is as follows:

project_dir
|- main.py
|- module_dir
|  |- __init__.py
|  |- module_file.py

Checking Whether the Selected Resource Flavor Is Correct

Error message "No module named npu_bridge.npu_init" is displayed for a training job.

from npu_bridge.npu_init import *
ImportError: No module named npu_bridge.npu_init

Check whether the flavor used by the training job supports NPUs. The possible cause is that the job selected a non-NPU flavor, for example, a GPU flavor. As a result, an error occurs when NPUs are used.

Summary and Suggestions

Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.