The world’s leading publication for data science, AI, and ML professionals.

SKORCH: PyTorch Models Trained with a Scikit-Learn Wrapper

A guide to understand how easy and simple it is to train PyTorch models with SKORCH

Photo by Kenneth Berrios Alvarez on Unsplash
Photo by Kenneth Berrios Alvarez on Unsplash

Have you ever wondered if there is a tool to train and evaluate a PyTorch model in a simple and easy way? Well, that tool exists and it is SKORCH, a scikit-learn wrapper for training PyTorch models.

In this blog, we are going to talk about what SKORCH is, its components and how easy it is to wrap a PyTorch model to train and evaluate it. The blog will be divided into the following sections:

  • What is SKORCH?
  • PyTorch model
  • Training with SKORCH

Let’s get started!

What is SKORCH?

SKORCH is the union of scikit-learn and PyTorch, in other words, SKORCH is a wrapper for training, tuning and optimizing PyTorch models. SKORCH is an open-source library launched in 2017 [1], SKORCH arises to combine and enhance the great virtues of the PyTorch and SciKit-learn frameworks together.

Figure 1. PyTorch + SciKit-Learn = SKORCH | Image by Author | Logos taken from original sources
Figure 1. PyTorch + SciKit-Learn = SKORCH | Image by Author | Logos taken from original sources

PyTorch is one of the most used frameworks for the development of neural network models, however, some phases take development time and sometimes it becomes a somewhat impractical part. SKORCH tries to simplify and streamline various processes in the training phase of a PyTorch model. It is common for the training module of a PyTorch model to be developed in one or more functions, however, when it is necessary to evaluate the model or optimize to find the optimal parameters, additional functions need to be developed. All this process is simplified by SKORCH, since it is a wrap based on scikit-learn, hence extends the functions that already carry out these processes.

SKORCH professes the philosophy [2] : "Be a scikit-learn API, hackable, do not hide PyTorch, do not reinvent the wheel"

In figure 2 we can see the capabilities of PyTorch and scikit-learn that compound SKORCH. As we can see, from the PyTorch side, the capabilities to prototype a model and handle datasets are used. On the other hand, we observe that scikit-learn already known functions are extended to be able to train, evaluate, tune and optimize Machine Learning models, this combination makes SKORCH a powerful tool.

Figure 2. Benefits from PyTorch and scikit-learn into SKORCH | Image by Author | Logos taken from original sources
Figure 2. Benefits from PyTorch and scikit-learn into SKORCH | Image by Author | Logos taken from original sources

On the other hand, we can see SKORCH as the "equivalent" to the Keras API, which extends from Tensorflow to accelerate, simplify and speed up the prototyping of neural network models. In this case, SKORCH would serve as the prototype tool for the training, tuning and optimization phase of Pytorch models.

Great, so far we already know what SKORCH is, what are its components and the advantages of using it, it is time to see an example to better understand how it works, let’s go for it!

PyTorch Model

In order to know how Skorch works when training a PyTorch model, we are going to create a neural network to predict the well-known wines dataset. So, first we are going to create a simple model for the classification of wines with respect to the aforementioned dataset, then we have:

If you want to access the full implementation, take a look at: https://github.com/FernandoLpz/SKORCH-PyTorch-Wrapper

As we can see, some values are fixed. For practical terms, I would like to highlight only lines 9 and 12. In line 9 we define 13 input features, this because the wine dataset contains 13 features. On the other hand, in line 12 we define an output of size 3, this because the classes that we are going to classify are 3 (that is, 3 types of wines).

Perfect, the PyTorch model is ready, it’s time to see how we train this model with SKORCH, let’s go to the next section!

Training with SKORCH

1. Basic Training

Training through SKORCH can be as simple or as elaborate as we need, for practical examples we will go gradually. So, a basic and simple way to train the model defined in the previous section would be as simple as the following lines of code:

We will analyze line by line. In line 2 we are importing the PyTorch model (which was defined in the previous section). In line 4 we are importing the class that will serve as a Wrapper for our PyTorch model. This class receives a series of important parameters (line 7) which are: the PyTorch model, the number of epochs, learning rate, batch size and optimizer. Obviously, they are not the only parameters we can define in this class, however for practicality, we will only show those already mentioned in this example. Finally, on line 9 we execute the "fit" method, which will be in charge of performing the entire training phase.

You may be wondering, "what about the split into train and validation?", well the NeuralNetClassifier class takes care of this as well. By default, this class implements StratifiedKFold split in the data with a ratio of 80% for training and 20% for validation. Well, once the above mentioned, this would be the output:

epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        9.6552       0.4167        9.2997  0.0124
      2        9.6552       0.4167        9.2997  0.0109
      3        9.6552       0.4167        9.2997  0.0107
      4        9.6552       0.4167        9.2997  0.0109
      5        9.6552       0.4167        9.2997  0.0116
      6        9.6552       0.4167        9.2997  0.0119
      7        9.6552       0.4167        9.2997  0.0114
      8        9.6552       0.4167        9.2997  0.0113
      9        9.6552       0.4167        9.2997  0.0115
     10        9.6552       0.4167        9.2997  0.0115

The structure of the output is as shown in the previous snippet. As we can see, by default it shows us information regarding loss in training and validation sets, as well as accuracy in validation and execution time. Since we have not tuned the model, the results are extremely poor, however, we will fix this in the next examples.

2. Pipeline: Scaler + Training

In the previous point, we saw how to train the model with SKORCH in a basic way. However, data processing is a very important phase which is always carried out prior to the training phase. In this case, we are going to carry out a very simple preprocessing, we are only going to scale the data and then we will carry out the training, for this, we are going to make use of the scikit-learn Pipeline module, so we would have the following:

In lines 5 and 6 we import the Pipeline and StandardScaler modules from scikit-learn. In line 12 we can see that we initialize the wrapper exactly the same as in the previous point (with fixed values), the interesting thing comes in lines 14 and 15 where the Pipeline is initialized, which contains the StandarScaler() module as well as the wrap of the PyTorch model. Running this we get:

epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        0.4663       0.8889        0.3528  0.0124
      2        0.0729       0.8889        0.6507  0.0111
      3        0.0420       0.9167        0.4564  0.0118
      4        0.0101       0.9167        0.3142  0.0116
      5        0.0041       0.9167        0.3321  0.0119
      6        0.0028       0.9167        0.3374  0.0129
      7        0.0022       0.9167        0.3376  0.0111
      8        0.0017       0.9167        0.3384  0.0122
      9        0.0014       0.9167        0.3373  0.0135
     10        0.0012       0.9167        0.3378  0.0118

It is important to highlight that the results improved notably, this due to the scaling of the data prior to the training phase.

So far we have seen how to train a PyTorch model (with fixed parameters) within an execution Pipeline as a scikit-learn module, however, how we could add other evaluation metrics such as accuracy or balanced accuracy to the SKORCH module, well this is where we make use of of the callbacks.

3. Pipeline: Scaler + Training + Callbacks

Callbacks are an extension of SKORCH that allows us to add other functions to the NeuralNetClassifier wrapper, for example, if we want the optimization metric to be _balancedaccuracy or ROC or any another classification metric, this can be done through callbacks. So, introducing a callback to calculate the accuracy and balanced accuracy of the model within the pipeline would be as follows:

So, as we can see in line 9 we import the EpochScoring callback. To make use of the callback, we only have to initialize it by passing the name of the metric we want to use as arguments, in this case, we initialize "_balancedaccuracy" and "accuracy" for the metrics. Also, we have to set the parameter "_lower_isbetter" as "False" because our problem seeks the maximization of the metrics, not the minimization.

So the result of executing the previous snippet would something like:

epoch    accuracy    balanced_accuracy    train_loss    valid_acc    
-------  ----------  -------------------  ------------  -----------  
      1      0.9722               0.9762        0.4780       0.9722        
      2      1.0000               1.0000        0.0597       1.0000        
      3      1.0000               1.0000        0.0430       1.0000        
      4      1.0000               1.0000        0.0144       1.0000        
      5      1.0000               1.0000        0.0110       1.0000        
      6      1.0000               1.0000        0.0083       1.0000        
      7      1.0000               1.0000        0.0067       1.0000        
      8      1.0000               1.0000        0.0058       1.0000        
      9      1.0000               1.0000        0.0047       1.0000        
     10      1.0000               1.0000        0.0039       1.0000        

Finally what remains to be seen is how to perform a Grid Search using the scikit-learn modules, let’s go for it!

4. GridSearch: Pipeline + Scaler + Training + Callbacks

To perform a Grid Search we only need to import the scikit-learn module. Performing the Grid Search is exactly the same as with the classic models of machine learning from scikit-learn, the only different point is in the definition of the parameters for the grid.

As we can see, the parameters have a particular aspect. We are adding the prefix "_nn_" and "_nnmodule_". These prefixes will help the wrapper to know if the parameter belongs to the definition of the PyTorch model or to the training phase. As we can see, we only use the prefix "_nn___" when we refer to parameters of the training phase and "_nnmodule___" when we refer to parameters of the PyTorch model. It is important that the name "nn" refers to the instantiation of the wrapper (line 19).

So if we want to know what the best parameters were, we can do it easily:

[1] print(gs.best_params_)
{'nn__lr': 0.1, 'nn__max_epochs': 10, 'nn__module__dropout': 0.1, 'nn__module__num_units': 10, 'nn__optimizer': <class 'torch.optim.adam.Adam'>}

If you want to access the full implementation, take a look at: https://github.com/FernandoLpz/SKORCH-PyTorch-Wrapper

Conclusion

In this blog we have seen what SKORCH is and what its components are. We have also seen how to implement the NeuralNetClassifier wrapper to train a PyTorch model in a very simple way.

In my opinion, SKORCH is here to stay. Sometimes it is required to quickly and flexibly prototype PyTorch models, SKORCH does this wonderfully.

References

[1] https://skorch.readthedocs.io/en/latest/index.html

[2] https://www.youtube.com/watch?v=Qbu_DCBjVEk


Related Articles