There are companies or clients that need to deploy models in the cloud, without training their models in the AWS environment, since a new training can modify performance, change metrics and ultimately not respond to fundamental needs.
This blog will show how to deploy an Xgboost model binary built for a developer, where a post-processing layer is added through an inference pipeline in sagemaker, deploying an endpoint.
Xgboost algorithm
Tree-based ensemble methods frequently get a good performance, besides offering an interpretation of variables that are employed, making them popular models within the machine learning solution developer community. Extreme gradient boosting (Xgboost) is a variant of a tree-based ensemble method widely useful for handling sparse data, employing a minimal amount of resources, besides being a highly scalable solution. Xgboost is a supervised learning model, where the learning process is sequential, capturing the error of each prior learner, being considered an adaptive algorithm. Also, employ gradient descent for learning. The next figure show as work the gradient boosting:
Add booster from .pkl to in tar.gz
The key process and the central theme of this publication are translated in this section. The fundamental element as a result of training a tree-based model is the booster. When we use the Xgboost library, we train and save a model, where a series of attributes from the modeling stage is saved, which for purposes of inference and use of the model, do not contribute. In this way, by rescuing only the booster from the format in which the model is saved, it allows us to communicate with the pre-built AWS Xgboost solution and generate the deployment and use of the solution
import xgboost
import joblib
import tarfile
model_pkl=joblib.load('model_client.pkl')
booster=model_pkl.get_booster()
booster.save_model('xgboost-model')
# add xgboost-model to tar.gz file
fp = tarfile.open("model.tar.gz","w:gz")
fp.add('xgboost-model')
fp.close()
Create model in sagemaker
The first step is to indicate url of the container of the algorithm. In this case, employed the container issued for AWS
region = Session().boto_region_name
xgboost_container = sagemaker.image_uris.retrieve("xgboost", region, "1.0-1")
The next setting is to create a model, with the SDK of sagemaker. Should facilitate the location artifacts in S3 and container of algorithm
from sagemaker.model import Model
xgboost_model=Model(xgboost_container,
model_data='s3://file_path_in_s3/model.tar.gz',
role=sagemaker.get_execution_role())
Setting inference pipeline
The next step is setup the processing of output of model. For this create a postprocessing model through SkLearnModel
Post processing
from sagemaker.sklearn.model import SKLearnModel
FRAMEWORK_VERSION = '0.23-1'
entry_point = 'postprocessing.py'
postprocessing_model = SKLearnModel(
model_data='s3://file_path_in_s3/model.tar.gz',
role=role,
entry_point=entry_point,
framework_version=FRAMEWORK_VERSION,
sagemaker_session=sagemaker_session
)
The entry point is a python file that contains functions that basically manage the output of the model (strings), associating a context. For this there consider the binary or multi-class, and context of the project. The next show an extract of this code:
def output_fn(prediction, accept):
accept, params = cgi.parse_header(accept.lower())
if accept == "application/json":
results = []
classes = prediction['classes']
score=prediction['scores']
score.insert(0,1-score[0])
score=[score]
for scores in score:
row = []
for class_, score in zip(classes, scores):
row.append({
'id': class_,
'score': score
})
results.append(row)
json_output = {"context": results[0]}
Pipeline model
from sagemaker.pipeline import PipelineModel
model_name='name-model'
inference_model = PipelineModel(
name=model_name,
role=sagemaker.get_execution_role(),
models=[
xgboost_model,
postprocessing_model,
])
Deploy and testing endpoint
Finally, we deploy models through an endpoint, where they work sequentially, obtaining the output according to the configuration designed by the user.
inference_model.deploy(
initial_instance_count=1,
instance_type='ml.m5.xlarge',
endpoint_name=endpoint_name
)
In the following code, you can see the response when invoking the endpoint that contains a post-processing container:
b'{"context": [{"id": "class-0", "score": 0.24162}, {"id": "class-1", "score": 0.75837}]}'
Conclusion and discussion
Using the steps listed above, you can deploy a model to the AWS Cloud, ensuring the consistency and performance of the model that is built locally. Something that can be developed taking this path is to work on the preprocessing of the algorithm that was worked with, and add the preprocessing layer to the inference pipeline, configuring this stage according to your need.
Top comments (0)