
June 13, 2023
Evan Seabrook, Director, Cloud Engineering
Kyle Bassett, Partner, Cloud Engineering
Machine learning: all businesses want it, but most tend to have trouble getting started. What’s more, organizations embarking on machine learning (ML) transformation often have difficulty scaling their machine learning operations (MLOps) and continuous training (CT) pipelines.
At PwC Canada, we’re introducing an accelerator that can help you and your business get started with MLOps and take it to the next level on any of the major cloud service providers (GCP, AWS and Azure). That accelerator is called the Data Analytics Workbench (DAW).
DAW is an infrastructure-as-code (IaC) implementation serving as a springboard for data scientists and ML engineers to build an MLOps implementation on the cloud. Cloud-based deployments typically introduce a number of benefits - scalability, enhanced security controls and better integration, to name a few. But there are also some additional benefits unique to MLOps that are worth exploring.
DAW enables ML practitioners to see what’s possible on the cloud, but it also makes those possibilities more accessible in a shorter amount of time and with minimal effort.
DAW allows teams to use common and industry-standard tooling at scale. A brief look at the GCP implementation of DAW helps illustrate this process in more detail:
The process of DAW implementation for GCP begins when a member of the development team deploys infrastructure using Terraform via Gitlab CI. Once deployed, adjustments can be made using Terraform modules and deployed automatically using the CI/CD pipeline.
In the case of the GCP implementation, DAW uses Vertex AI Pipelines to orchestrate MLOps end to end. Vertex AI is essentially managed Kubeflow: tasks describing various Vertex AI components are declaratively instantiated in Python with the ability to declare upstream, downstream and conditional relationships between these tasks.
Vertex AI Pipelines comes with a number of out-of-the-box components to help you take advantage of Vertex AI and remove some of the monotony from MLOps. For instance, the HyperparameterTuningJobRunOp allows you to train several models in parallel to minimize (or maximize) a metric of your choosing.
With DAW comes a Vertex AI Pipeline prebuilt with several components useful for teams just getting started with machine learning on Google Cloud.
The open-source IaC offering from Hashicorp – Terraform – allows teams to approve, audit and control component deployment in cloud environments without hindering innovation. Infrastructure written as code enables organizations to reuse certain abstractions (modules) and deploy multiple environments. Organizations can also define policies around their infrastructure to make sure certain industry rules (such as data residency) are respected.
Lastly, IaC makes deprovisioning infrastructure a breeze, allowing the easy and safe destruction of sandbox environments when appropriate.
DAW also includes a highly scalable JupyterLab environment (Vertex AI Workbench). This supports multiple users simultaneously and enables code revision control, facilitates peer review, rolls back defective code and integrates new code features.
DAW is available for Google Cloud, AWS and Azure. If your organization is already working with one of these providers and wishes to begin its MLOps journey with the same provider, DAW can easily integrate with your existing data platform. It can also be extended to support a multi-cloud data approach.
If your organization currently has its data on-prem, the Cloud Engineering team at PwC Canada can help migrate and modernize your data platform as a precursor to integrating DAW. We’re here to partner with you on your data journey and can help assess and implement what would work best for your unique needs.
For more information on our Data Analytics Workbench and how we can help, contact us.
What do generative AI and other tech innovations mean for the future of Canadian banks?
PwC’s cloud engineering and transformation team complements your business with emerging technologies for accelerating outcomes, scale and innovation.