After a model is prepared, you can deploy the model as a real-time service and predict and call the service.
A maximum of one real-time service can be deployed.
Parameter |
Description |
---|---|
Name |
Name of the real-time service. Set this parameter as prompted. |
Auto Stop |
After this parameter is enabled and the auto stop time is set, a service automatically stops at the specified time. The auto stop function is enabled by default, and the default value is 1 hour later. The options are 1 hour later, 2 hours later, 4 hours later, 6 hours later, and Custom. If you select Custom, you can enter any integer from 1 to 24 hours in the textbox on the right. |
Description |
Brief description of the real-time service. |
Parameter |
Sub-Parameter |
Description |
---|---|---|
Resource Pool |
Public resource pools |
Instances in the public resource pool can be of the CPU or GPU type. |
Resource Pool |
Dedicated resource pools |
For details about how to create a dedicated resource pool, see Creating a Dedicated Resource Pool. You can select a specification from the resource pool specifications. |
Model and Configuration |
Model Source |
You can select My Models or My Subscriptions based on site requirements. The models that match the model sources are displayed. |
Model |
The system automatically associates with the list of available models. Select a model in the Normal status and its version. |
|
Traffic Ratio (%) |
Set the traffic proportion of the current instance node. Service calling requests are allocated to the current version based on this proportion. If you deploy only one version of a model, set this parameter to 100%. If you select multiple versions for gated launch, ensure that the sum of the traffic ratios of multiple versions is 100%. |
|
Specifications |
If you select Public resource pools, you can select the CPU or GPU resources based on site requirements. For details, see Table 3. |
|
Compute Nodes |
Set the number of instances for the current model version. If you set Instances to 1, the standalone computing mode is used. If you set Instances to a value greater than 1, the distributed computing mode is used. Select a computing mode based on the actual requirements. |
|
Environment Variable |
Set environment variables and inject them to the container instance. To ensure data security, do not enter sensitive information, such as plaintext passwords, in environment variables. |
|
Add Model and Configuration |
ModelArts supports multiple model versions and flexible traffic policies. You can use gated launch to smoothly upgrade the model version. NOTE:
If the selected model has only one version, the system does not display Add Model Version and Configuration. |
|
Traffic Limit |
N/A |
Maximum number of times a service can be accessed within a second. You can set this parameter as needed. |
Specifications |
Description |
---|---|
ExeML specifications (CPU) ExeML specifications (GPU) |
Only be used by models trained in ExeML projects. |
CPU: 2 vCPUs | 8 GiB |
Suitable for models with only CPU loads. |
CPU: 8 vCPUs | 32 GiB GPU: 1 x T4 |
Suitable for models requiring CPU and GPU (NVIDIA T4) resources. |
After a real-time service is deployed, it is started immediately.
You can go to the real-time service list to view the basic information about the real-time service. In the real-time service list, after the status of the newly deployed service changes from Deploying to Running, the service is deployed successfully.