Make it to prod for ML with Kubernetes, Kubeflow and seldon-core

Photo by Connor McSheffrey on Unsplash

A big challenge businesses faces is the deployment of machine learning models in production environments. It requires dealing with a complex set of moving parts through different pipelines. Once the models are developed, they need to be trained, deployed, monitored and kept track of.

This post tries to describe, how using AWS EKS, Kubeflow-Pipelines and seldon-core enables productionalizing the deployment of ML models including important components like CI/CD pipeline, Model Registry and a scalable inference layer using modern microservice architecture. All the necessary steps for creating this architecture are available on Github.

There are essentially 3 stages in the ML execution pipeline. Firstly, there is the preprocessing stage, followed by training stage and lastly there is inference.
In the first stage, raw data is received. It can be unstructured, semistructured or structured data and from there (important) features are extracted. In the training stage, it is determined how much we care about those features, hence building the models. Once the built model has satisfactory performance and acurracy, it is exposed to users/partners to ripe the benefits of the endevour. And that in of itself, produces more data that we can use to advance the development of the model, in the next iteration.

Looking at a company that is not internet native, taking advantage of machine learning technologies and implementations, is something new and intriguing. However there are challenges. There have been decades of software development, massive investments have been made in building the infrastructure. Years of experience have been collected in process engineering, tooling, infrastructure and monitoring. Some companies earlier and some later have already collected a lot of experience. On top of that, regulations come along that have to be followed. Therefore, it is ciritical to think of ML as part of the ecosystem that has to integrated, not as an independent software system that can live on its own.

ML project usually start with research style environment, where the experiments are usually run on small data dumps, on local Jupyter Notebooks, doing some iterative work. Moving it to deploy into production can be a gruelling task. When moving from a small subset of data, to the real production, the workload are several orders of magnitude bigger. Data might be streaming and comes with certain time constraints. Integration with operational databases is needed without causing disruptions. To analyze these data, and extract meaningful insights is suddenly not easy anymore and requires much more processing power.

Another challenge in the local set up is keeping track of what experiments yields optimal results. How can one go back to version that provided these results. One also needs to be able to test several approaches at the same time to guarantee maximum benefits from this endevour. And once all that is done, one needs to monitor the model on real time, check for performance drift, and incorporate the feedback loop in the pipeline.

To achieve this, many teams need to collaborate, data needs to be available, upto date, models need to be developed, application developers need to incorporate this new step in their flow, and devops engineers need to enable deployment. On top, the question raises, where do the models train, should it be done on production, or in earlier stages. How are they promoted to production?

The challenges mentioned above are hurdles potential pitfall for the project. Maybe training process is too memory/space intensive to effectively run with partial or whole production workload. Maybe data silos make within the company make it impossible to have a full training execution on current data. Maybe inter-team communication overhead, makes the project execution slow and costly. The result is that 80% of ML initiatives, do not make it into production, failing to provide value to stakeholders.

Tooling

Kubeflow Pipelines is a great tool to orchestrate ML pipelines. It is robust, reliable and provides a great developer experience (DX). The pipelines themselves are defined as code, providing possibility to track and rollback changes. The tool supports multiple execution engines and is open-source.

S3 in our case is used as poor man’s choice of a model registry. We push our trained models there, and make them available for the final stage, namely staging. Currently there are excellent free to use or open-source tools for model registry: Model DB, MLflow Model Registry, just to name a few.

Seldon-core is a powerful tool to deploy and monitor ML models on Kubernetes clusters. It supports multiple ML frameworks and deployment setups.

Modern ML Pipeline

How an automated pipeline might look like, is depicted in the picture.
The starting point are a set of data sources. The pipeline orchestrates preprocessing, model training and registration of the model in model registry. Each of these steps might contain multiple substeps and might use different set of technologies. For instance, in preprocessing a combination of Spark and simple python transformations can be used. Training includes training of more than one model using multiple frameworks, e.g. scikit learn, Tensorflow, etc. And lastly, all the produced models need to be registered in the registry.
Final step in the pipeline, after successful completion of the previous ones is deployment, done using seldon-core and kubernetes manitfests.

Summary

Resources

Senior ML/Data Engineer