File and folder structure
From this point on, we will work with several files (the ML model, the score.py file, the web service schema etc.). I recommend putting all of these files under a single root directory, similar to this structure:
I will cover the specific files as we go through the deployment configuration process. Bottom line is, stuff works best when we have it in one place. In the CLI, navigate to this directory.
Model registration
To get our machine learning model into Azure, we first need to export it to a file (or several). This will differ from framework to framework, but for our example model written in Keras, it’s as simple as
model.save("model.h5")
When you have the file, add it to the target directory (chosen above) and under „model“. We now need to set the target model management account (remember, we named it „amlwmodelmanagement“ in the previous article):
az ml account modelmanagement set -n amlwmodelmanagement -g amlw
To register the model with the management account, run the following:
az ml model register --model model/model.h5 --name AMLWModel
You should see a confirmation like this one:
Mark down the ID of the model – we will need to specify it later when creating the web service manifest.
Notice how the model is being referred to as being under “model/model.h5” path – the CLI will register it under the path relative to the current directory. We will need to remember this later when we attempt to load and use the model.
If you now go to the Azure portal and look into the model management UI, you should be able to see your model:
So the model is ready – let’s start creating the web service to make use of it.
Web service schema
We are going to write some python Python code next. Open the root directory containing our model’s directory and create a file called “score.py”. Put the following code inside the file:
model = None def init(): # this will load and initialize our model return def run(inputs): # this will call the loaded model and do the scoring return def generate_api_schema(): # this creates an input and output schema for our webservice and save it into a json file from azureml.api.schema.dataTypes import DataTypes from azureml.api.schema.sampleDefinition import SampleDefinition from azureml.api.realtime.services import generate_schema inputs = {"input_array": SampleDefinition(DataTypes.STANDARD, "dGVzdA==")} outputs = {"coordinates": SampleDefinition(DataTypes.STANDARD, [1, 2, 4, 5, 8, 9, 5, 6])} print(generate_schema(inputs=inputs, filepath="schema.json", run_func=run, outputs=outputs)) generate_api_schema()
The actual content for “inputs” and “outputs” depends on your model and use case. For our example model (a computer vision task), the “run” function takes a base64-encoded byte array representing an image, transforms it into a valid input for our model, then calls the model. We’ll get to the implementation in short order. There are various configuration options available to customize the schema, you can find details in the documentation.
Now run the code:
python score.py
This will create a „schema.json“ file in our working directory. This file will be used to describe our web service endpoint and validate inputs. When you are satisfied with your schema, delete the last line of the file – it’s not necessary anymore as the schema is already created.
Now let’s actually implement the web service!
Web service endpoint implementation
First off, the whole implementation will live in the score.py file. Loading the model is easy – it will be provided automatically under the relative path under which it was registered – in our case, this path is “model/model.h5”:
def init(): from keras.models import load_model as load_keras_model from PIL import Image local_path = 'model/model.h5' global model model = load_keras_model(local_path)
Again, the actual implementation will differ depending on what you want to do. At any rate, our model is loaded and ready to be used.
The run function takes in the arguments defined by the schema, does any pre-processing we need, triggers the model, then returns the prediction:
def run(input_array): import base64 import io img = Image.open(io.BytesIO(base64.b64decode(input_array))) data = np.array(img) data = data / 255 data = np.expand_dims(data, axis=0) prediction = model.predict(data) return { "coordinates": prediction.tolist()[0] }
Since our example model inputs an image, we convert the base64-encoded image into a byte array, convert it into an image, normalize it, then feed it into the model. We take the prediction (an array of 8 numbers) and wrap it in a result object.
Once again, this code can do whatever your model needs it to do. Mileage may vary. It’s a good idea to keep the code in the score.py file as short as possible to not hinder performance. If I needed to do some preprocessing, such as resizing the input picture to the correct size and turning it black and white, I could do that on the caller’s side or maybe use a serverless function as a proxy.
With this, the hard parts are over – we can now package everything into a Docker image.
Web service manifest
To package the web service, we first need to create a manifest file for it. This manifest is used by Azure when creating the Docker image. The CLI will help us:
az ml manifest create --manifest-name AMLWManifest -f score.py -r python -i 819cfa3a21614bc5bc56cbc5746343e5 -s schema.json -c conda.yml
There is a lot to parse here, let’s break it down:
- –manifest-name = the name of the manifest, nothing special
- -f = the score.py file that will serve incoming requests
- -r = runtime, either “python” or “spark-py”
- -i = the ID of the model that will be made available to the service (we received this when we registered the model)
- -s = the schema.json describing the service interface
- -c = the Anaconda environment configuration file – this file is used to define what non-standard python libraries should be installed in the Docker image
The Anaconda file is crucial here – without it, most of our required libraries would be missing from our Docker image and the webservice would fail to execute.
Our model requires Keras, TensorFlow, Pillow, and some other dependencies to be installed. A conda file for this setup would looks similar to this:
# Version of this configuration file's structure and semantics in AzureML. # This directive is stored in a comment to preserve the Conda file structure. # [AzureMlVersion] = 2 name: project_environment dependencies: - python=3.5.2 - pip: - --index-url https://azuremldownloads.azureedge.net/python-repository/preview - --extra-index-url https://pypi.python.org/simple - azureml-requirements - numpy - tensorflow - keras - Pillow - azure-ml-api-sdk==0.1.0a10
This file will be parsed by Azure when creating the Docker image and install the indicated packages. Nothing simpler – we just need to remember to reference the file in the create manifest command.
The command will also return some information about the manifest, including its ID. Mark it down for the next step.
Create a docker image
Now that we have a manifest, we can create the Docker image. Or, more precisely, let Azure create and store it for us (we need to provide the manifest ID here):
az ml image create -n amlwimage --manifest-id 7c8efb6f-823a-4a6a-812f-401c1d13b898
Note that this operation will take some time, depending on how much data (models, packages etc.) is required. When the operation completes, you will get the ID of the image back. This image is automatically stored in the Azure Container Registry instance created within your environment.
Test the service locally
The last step to test and deploy the image locally is to pull and run it – this is where the Docker installation comes into play.
az ml service create realtime --image-id 49c07d92-c12c-42c8-90ba-1821b5be90aa -n AMLWService
This command will pull the image from the Azure Container Registry (it’s usually a few GB so it will take some time) and starts it on our local machine. After the container starts, you should see something similar to this:
To see what the actual address of the service is we can run the following:
az ml service usage realtime -i AMLWService
The command gives us some details about the local instance of the service:
So you see the service is listening at port 32770. You can visit the Swagger URL to see the description of the API and use the “score” endpoint to do the actual prediction. It also gives you example CLI and curl commands.
Ok, we have a service, and everything runs. Now let’s see if something actually happens when we call it!
Since our example takes in a base64-encoded image data, it’s best to use something different than a command line interface. I prefer Postman, but your tastes might differ.
You can notice that the JSON payload conforms to the schema.json we generated and included in the service manifest – it has an “input_array” field containing a base64-encoded string with our test image.
The service seems to be responding:
That’s it. We now have a self-contained Docker image with our ML model, a web service exposing said model, and a local deployment of the image for testing. And everything seems to be performing well!
Some interesting details
All artifacts created using the Azure ML CLI are stored in Model Management Account and versioned – if you realize you made a mistake (forgot to add something to your conda file, left a bug in the score.py code etc.), we can always create a new artifact (model, manifest, image etc.) with the same name. Unfortunately, all artifacts depending on the one we changed need to be rebuilt – so if we create a new version of the service manifest, we need to create a new Docker image. If we want to modify the model, we need to rebuild both the manifest and the image (not to mention update any and all deployments).
We can see the various version in the Azure portal, under Model Management.
Doing this often can be very time-consuming. For any kind of production-grade deployment, it would be best to create a set of scripts to automate the process as much as possible. To make this easier, the whole process, from model registration to image creation, can be done in a single AZ ML command:
az ml service create realtime --model-file [model file/folder path] -f [scoring file e.g. score.py] -n [your service name] -s [schema file e.g. service_schema.json] -r [runtime for the Docker container e.g. spark-py or python] -c [conda dependencies file for additional python packages]
The fact that the whole service lives in a Docker container makes testing it extremely easy – just set up a test environment with Docker and you can pull the image, start a container, and run automated tests against the service from within your normal CI pipeline.
Conclusion
This marks the end of the second part in the series. We saw how to register models with Azure, how to describe the exposing service, and how to create a Docker image with the service. We also managed to pull the Docker image, start it, and test that our service is actually doing what we want it to do.
At this point we are ready to make the last step – deploying the service into production and consume it over the internet.
Read previous article: Operationalizing machine learning models 1/3 – Azure ML Workbench