Lai, Weijian 6aa966a79a ModelArts UMN 24.3.0 version

Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lai, Weijian <laiweijian4@huawei.com>
Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>

2024-11-02 09:04:52 +00:00

2.5 KiB

Raw Permalink Blame History

Error Message "CUDNN_STATUS_NOT_SUPPORTED" Displayed in Logs

Symptom

The following error message is displayed during PyTorch training:

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

Possible Causes

The input data is not of contiguous type, which is not supported by cuDNN.

Solution

Disable cuDNN before training.
```
torch.backends.cudnn.enabled = False
```

Convert the input data into contiguous data.

images = images.cuda()  
images = images.permute(0, 3, 1, 2).contigous()

Summary and Suggestions

Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.

Parent topic: Service Code Issues

2.5 KiB Raw Permalink Blame History