Automated Machine Learning (AutoML) is one of the fascinating subfields of Data Science right now. It sounds fantastic for those who are unfamiliar with machine learning, but it concerns present Data Scientists a lot. The media presentation of AutoML suggests that the technology has the potential to drastically transform the way we produce models by removing the need for Data Scientists. In principle, utilizing AutoML to automate the process entirely is a brilliant idea, but it introduces several opportunities for bias and misunderstanding in practice. Machine learning model training can be a time-consuming process. Automated Machine Learning (AutoML) makes identifying the best strategy for your circumstance and dataset easier.
ML.NET is an open-source, cross-platform machine learning framework for .NET developers that allows custom machine learning to be integrated into .NET applications. Microsoft changed the AutoML implementation in its Model Builder and ML.NET CLI tools based on Microsoft Research’s Neural Network Intelligence (NNI) and Fast and Lightweight AutoML (FLAML) technology last year.
New AutoML Updates
Training machine learning models is a time-consuming and iterative task. Automated Machine Learning (AutoML) automates that process by making it easier to find the best algorithm for your scenario and dataset. AutoML is the backend that powers the training experiences in Model Builder and the ML.NET CLI. Last year we announced updates to the AutoML implementation in our Model Builder and ML.NET CLI tools based Neural Network Intelligence (NNI) and Fast and Lightweight AutoML (FLAML) technologies from Microsoft Research. These updates provided a few benefits and improvements over the previous solution which include:
- Increase in the number of models explored.
- Improved time-out error rate.
- Improved performance metrics (for example, accuracy and R-squared).
The Experiment API
An experiment is a collection of training runs or trials. Each trial produces information about itself such as:
Evaluation metrics: The metrics used to assess the predictive capabilities of a model.
Pipeline: The algorithm and hyperparameters used to train a model.The experiment API includes a set of AutoML defaults, making it easier to add to a training pipeline. The dataPrepPipeline in this code snippet is a sequence of transforms to get the data into the proper format for training. The AutoML components required to train a regression model are added to that pipeline.
The same idea holds for other supported cases, such as categorization. When building an experiment using the training pipeline, one may choose the length of the training, the training and validation sets, and the evaluation measure they are optimizing. After setting up the pipeline and experiment, call the Run function to begin training.
// Configure AutoML pipeline
var experimentPipeline =
dataPrepPipeline
.Append(mlContext.Auto().Regression(labelColumnName: "fare_amount"));
// Configure experiment
var experiment = mlContext.Auto().CreateExperiment()
.SetPipeline(experimentPipeline)
.SetTrainingTimeInSeconds(50)
.SetDataset(trainTestSplit.TrainSet, validateTestSplit.TrainSet)
.SetEvaluateMetric(RegressionMetric.RSquared, "fare_amount", "Score");
// Run experiment
var result = await experiment.Run();
In this code snippet, the dataPrepPipeline is the series of transforms to get the data into the right format for training. The AutoML components to train a regression model are appended onto that pipeline. The same concept applies for other supported scenarios like classification.
What’s next for ML.NET?
We’re actively working towards the areas outlined in our roadmap.
Deep Learning
A few months ago we shared our plan for deep learning. A significant portion of that plan revolves around improving ONNX experiences for consumption and enabling new scenarios through TorchSharp, A .NET library that provides access to the library that powers PyTorch. Some of the progress we’ve made towards this plan includes:
Enabled global GPU flags for ONNX inferencing. Prior to this update, when you wanted to use the GPU for inferencing with ONNX models, the FallbackToCpu and GpuDeviceId flags in the ApplyOnnxModel transform were not saved as part of the pipeline. As a result, you had to fit the pipeline every time. We’ve made these flags accessible as part of the MLContext so you can save them as part of your model.
TorchSharp targets .NET Standard. TorchSharp originally targeted .NET 5. As part of our work in enabling TorchSharp integrations into ML.NET, we’ve updated TorchSharp to target .NET Standard.
We’re excited to share with you the progress we’ve made integrating TorchSharp with ML.NET in the coming weeks. Stay tuned for the blog post.
.NET DataFrame
Clean and representative data improves the performance of the model. As a result, data analysis, cleansing, and preparation for training is a crucial stage in the machine learning workflow. A few years back, we introduced the DataFrame type to.NET as a preview in Microsoft.Data.Analysis NuGet package. The DataFrame is still in preview. as it is very important for someone to have the tools to perform data cleaning and processing tasks and have started to organize and prioritize feedback, they address existing stability and developer experience pain points. The feedback is being organized as part of a GitHub issue.
MLOps
Machine Learning Operations (MLOps) is like DevOps for the machine learning lifecycle. This includes things like model deployment & management and data tracking, which help with productionizing machine learning models. We’re constantly evaluating ways to improve this experience with ML.NET.
Recently we published a blog post that guides you through the process of setting up Azure Machine Learning Datasets, training an ML.NET model using the ML.NET CLI and configuring a retraining pipeline with Azure Devops. For more details, check out the post Train an ML.NET model in Azure ML.
Conclusion
Machine Learning Operations (MLOps) is a machine learning lifecycle equivalent to DevOps. It comprises features such as model deployment and administration and data tracking, which aids in producing machine learning models. Microsoft continually strives for ways to improve the ML .NET experience. These are some updates brought in by Microsoft ib their ML .NET framework, which will help developers better their workflow.