After an AI application is prepared, you can deploy the AI application as a real-time service and predict and call the service.
A maximum of 20 real-time services can be deployed by a user.
Real-time services deployed using the public resource pool also occupy quota resources when the services are Abnormal or Stopped. If the quota is insufficient and no more services can be deployed, delete some abnormal services to release resources.
Quota calculation:
Metering calculation:
Parameter |
Sub-Parameter |
Description |
---|---|---|
Resource Pool |
Public resource pools |
Instances in the public resource pool can be of the CPU or GPU type. |
Resource Pool |
Dedicated resource pools |
Select a specification from the dedicated resource pool specifications. NOTE:
For more details about the new-version dedicated resource pools, see Comprehensive Upgrades to ModelArts Resource Pool Management Functions. |
Multi-Pool Load Balancing |
N/A |
After this function is enabled, the service will be deployed in two dedicated resource pools, and service traffic will be evenly distributed among the pools through a load balancer. This minimizes the impact on the service after one resource pool fails, improving service reliability. After this function is enabled, the number of compute nodes must be a multiple of 2 and the minimum value is 2. NOTE:
|
AI Application and Configuration |
AI Application Source |
Select My AI Applications based on your requirements. |
AI Application and Version |
Select the AI application and version that are in the Normal state. |
|
Traffic Ratio (%) |
Set the traffic proportion of the current instance node. Service calling requests are allocated to the current version based on this proportion. If you deploy only one version of an AI application, set this parameter to 100%. If you select multiple versions for gated launch, ensure that the sum of the traffic ratios of multiple versions is 100%. |
|
Specifications |
Select available specifications based on the list displayed on the console. The specifications in gray cannot be used in the current environment. If specifications in the public resource pools are unavailable, no public resource pool is available in the current environment. In this case, use a dedicated resource pool or contact the administrator to create a public resource pool. NOTE:
When the selected flavor is used to deploy the service, necessary system consumption is generated. Therefore, the resources actually occupied by the service are slightly greater than the selected flavor. |
|
Compute Nodes |
Set the number of instances for the current AI application version. If you set Instances to 1, the standalone computing mode is used. If you set Instances to a value greater than 1, the distributed computing mode is used. Select a computing mode based on the actual requirements. |
|
Environment Variable |
Set environment variables and inject them to the pod. To ensure data security, do not enter sensitive information in environment variables. |
|
Timeout |
Timeout of a single model, including both the deployment and startup time. The default value is 20 minutes. The value must range from 3 to 120. |
|
WebSocket |
Whether to deploy a real-time service as a WebSocket service. NOTE:
|
|
Add AI Application Version and Configuration |
If the selected AI application has multiple versions, you can add multiple versions and configure a traffic ratio. You can use gray launch to smoothly upgrade the AI application version. NOTE:
Free compute specifications do not support the gray launch of multiple versions. |
|
Data Collection |
N/A |
This function is disabled by default. When enabled, it collects and stores data generated when a real-time service is called based on configured rules. |
After a real-time service is deployed, it is started immediately.
In the real-time service list, after the status of the newly deployed service changes from Deploying to Running, the service is deployed successfully.