This API is used to deploy a model as a service.
POST /v1/{project_id}/services
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Project ID. For details about how to obtain the project ID, see Obtaining a Project ID. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
service_name |
Yes |
String |
Service name. Enter 1 to 64 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed. |
description |
No |
String |
Service description, which contains a maximum of 100 characters. By default, this parameter is left blank. |
infer_type |
Yes |
String |
Inference mode. The value can be real-time or batch.
|
workspace_id |
No |
String |
ID of the workspace to which a service belongs. The default value is 0, indicating the default workspace. |
vpc_id |
No |
String |
ID of the VPC to which a real-time service instance is deployed. By default, this parameter is left blank.
|
subnet_network_id |
No |
String |
ID of a subnet. By default, this parameter is left blank. This parameter is mandatory when vpc_id is configured. Enter the network ID displayed in the subnet details on the VPC console. A subnet provides dedicated network resources that are isolated from other networks. |
security_group_id |
No |
String |
Security group. By default, this parameter is left blank. This parameter is mandatory when vpc_id is configured. A security group is a virtual firewall that provides secure network access control policies for service instances. A security group must contain at least one inbound rule to permit the requests whose protocol is TCP, source address is 0.0.0.0/0, and port number is 8080. |
cluster_id |
No |
String |
ID of a dedicated resource pool. This parameter is left blank by default, indicating that no dedicated resource pool is used. When using a dedicated resource pool to deploy services, ensure that the resource pool is running properly. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect. If this parameter is configured together with cluster_id in real-time config, cluster_id in real-time config is used preferentially. |
config |
Yes |
config array corresponding to infer_type |
Model running configuration. If infer_type is batch, you can configure only one model. If infer_type is real-time, you can configure multiple models and assign weights based on service requirements. However, the versions of these models cannot be the same. |
schedule |
No |
schedule array |
Service scheduling configuration, which can be configured only for real-time services. By default, this parameter is not used. Services run for a long time. For details, see Table 5. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
model_id |
Yes |
String |
Model ID |
weight |
Yes |
Integer |
Traffic weight allocated to a model. This parameter is mandatory only when infer_type is set to real-time. The sum of the weights must be 100. |
specification |
Yes |
String |
Resource specifications. Select specifications based on service requirements. |
custom_spec |
No |
Object |
Custom specifications. Set this parameter when you use a dedicated resource pool. For details, see Table 6. |
instance_count |
Yes |
Integer |
Number of instances deployed in a model. The value must be greater than 0. |
envs |
No |
Map<String, String> |
(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. To ensure data security, do not enter sensitive information, such as plaintext passwords, in environment variables. |
cluster_id |
No |
string |
ID of a dedicated resource pool. By default, this parameter is left blank, indicating that no dedicated resource pool is used. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
model_id |
Yes |
String |
Model ID |
specification |
Yes |
String |
Resource flavor. Options: modelarts.vm.cpu.2u and modelarts.vm.gpu.p4 |
instance_count |
Yes |
Integer |
Number of instances deployed in a model. |
envs |
No |
Map<String, String> |
(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. To ensure data security, do not enter sensitive information, such as plaintext passwords, in environment variables. |
src_type |
No |
String |
Data source type. This parameter can be set to ManifestFile. By default, this parameter is left blank, indicating that only files in the src_path directory are read. If this parameter is set to ManifestFile, src_path must be a specific manifest file path. You can specify multiple data paths in the manifest file. |
src_path |
Yes |
String |
OBS path of the input data of a batch job |
dest_path |
Yes |
String |
OBS path of the output data of a batch job |
req_uri |
Yes |
String |
Inference API called in a batch task, which is a REST API in the model image. Select an API URI from the model config.json file for inference. If a ModelArts built-in inference image is used, the value of this parameter is /. |
mapping_type |
Yes |
String |
Mapping type of the input data. The value can be file or csv.
|
mapping_rule |
No |
Map |
Mapping between input parameters and CSV data. This parameter is mandatory only when mapping_type is set to csv. Mapping rule: The mapping rule comes from the input parameter (input_params) in the model configuration file config.json. When type is set to string, number, integer, or boolean, you need to configure the index parameter. For details, see . The index must be a positive integer starting from 0. If the value of index does not comply with the rule, this parameter is ignored in the request. After the mapping rule is configured, the corresponding CSV data must be separated by commas (,). |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
type |
Yes |
String |
Scheduling type. Only the value stop is supported. |
time_unit |
Yes |
String |
Scheduling time unit. Options:
|
duration |
Yes |
Integer |
Value that maps to the time unit. For example, if the task stops after two hours, set time_unit to HOURS and duration to 2. |
The following shows how to deploy different types of services.
POST https://endpoint/v1/{project_id}/services { "service_name": "mnist", "description": "mnist service", "infer_type": "real-time", "config": [ { "model_id": "xxxmodel-idxxx", "weight": "100", "specification": "modelarts.vm.cpu.2u", "instance_count": 1 } ] }
{ "service_name": "mnist", "description": "mnist service", "infer_type": "real-time", "config": [ { "model_id": "xxxmodel-idxxx", "weight": "70", "specification": "modelarts.vm.cpu.2u", "instance_count": 1, "envs": { "model_name": "mxnet-model-1", "load_epoch": "0" } }, { "model_id": "xxxxxx", "weight": "30", "specification": "modelarts.vm.cpu.2u", "instance_count": 1 } ] }
{ "service_name": "realtime-demo", "description": "", "infer_type": "real-time", "cluster_id": "8abf68a969c3cb3a0169c4acb24b0000", "config": [{ "model_id": "eb6a4a8c-5713-4a27-b8ed-c7e694499af5", "weight": "100", "cluster_id": "8abf68a969c3cb3a0169c4acb24b0000", "specification": "custom", "custom_spec": { "cpu": 1.5, "memory": 7500, "gpu_p4": 0, }, "instance_count": 1 }] }
{ "service_name": "service-demo", "description": "demo", "infer_type": "real-time", "config": [{ "model_id": "xxxmodel-idxxx", "weight": "100", "specification": "modelarts.vm.cpu.2u", "instance_count": 1 }], "schedule": [{ "type": "stop", "time_unit": "HOURS", "duration": 1 }] }
{ "service_name": "batchservicetest", "description": "", "infer_type": "batch", "cluster_id": "8abf68a969c3cb3a0169c4acb24b****", "config": [{ "model_id": "598b913a-af3e-41ba-a1b5-bf065320f1e2", "specification": "modelarts.vm.cpu.2u", "instance_count": 1, "src_path": "https://infers-data.obs.xxxx.com/xgboosterdata/", "dest_path": "https://infers-data.obs.dxxxx.com/output/", "req_uri": "/", "mapping_type": "file" }] }
{ "service_name": "batchservicetest", "description": "", "infer_type": "batch", "config": [{ "model_id": "598b913a-af3e-41ba-a1b5-bf065320f1e2", "specification": "modelarts.vm.cpu.2u", "instance_count": 1, "src_path": "https://infers-data.obs.xxxx.com/xgboosterdata/", "dest_path": "https://infers-data.obs.xxxx.com.com/output/", "req_uri": "/", "mapping_type": "csv", "mapping_rule": { "type": "object", "properties": { "data": { "type": "object", "properties": { "req_data": { "type": "array", "items": [{ "type": "object", "properties": { "input5": { "type": "number", "index": 0 }, "input4": { "type": "number", "index": 1 }, "input3": { "type": "number", "index": 2 }, "input2": { "type": "number", "index": 3 }, "input1": { "type": "number", "index": 4 } } }] } } } } } }] }
{ "data": { "req_data": [{ "input1": 1, "input2": 2, "input3": 3, "input4": 4, "input5": 5 }] } }
{ "service_id": "10eb0091-887f-4839-9929-cbc884f1e20e", "resource_ids": [ "INF-f878991839647358@1598319442708" ] }
For details about the status code, see Table 1.
See Error Codes.