Using scikit-learn model into Java app
It’s usually a challenge to migrate long codebase from one programming language to another, and sometimes it doesn’t even make sense. In case you are using Java and need to apply a machine learning model from scikit-learn, this tutorial may help you.
We are going to use 3 libraries for this purpose: scikit-learn, sklearn2pmml, and pllm4s. To help manage data, we’re also using other libraries, but you can replace them, if needed.
How the code looks like
To export the model, we have simple steps, like described below. In this example, we are using regressor in the PMMLPipeline class, but you can use a classifier, if that’s you purpose. Don’t forget to add scikit-learn and sklearn2pmml to your python project dependencies.
from sklearn.datasets import load_diabetes
from sklearn.tree import DecisionTreeRegressor
from sklearn2pmml import PMMLPipeline, sklearn2pmml
import pandas as pd# fetching data example
df = load_diabetes()
X = pd.DataFrame(columns = df.feature_names, data = df.get('data'))
y = pd.DataFrame(columns = ['target'], data = df.get('target'))# here you can use the key classifier, if suitable
pipeline = PMMLPipeline([ ('regressor', DecisionTreeRegressor()) ])#training the model
pipeline.fit(X, y)# exporting the model
sklearn2pmml(pipeline, 'model.pmml', with_repr = True)
Make sure to add column names to your dataset, otherwise you won’t be able to identify the features in the Java code. Also, this example is very simple, but if you have a complex problem, you can always use more of the library functionality by looking at the SkLearn2PMML documentation.
In the Java code, it’s also very simple to make your predictions. We show an example below. In this example, we’re using the pmml4s library, so don’t forget to add in your dependencies.
import org.pmml4s.model.Model;
import java.util.*;
public class Main {
private final Model model = Model.fromFile(Main.class.getClassLoader().getResource("model.pmml").getFile());
public Double getRegressionValue(Map<String, Double> values) {
Object[] valuesMap = Arrays.stream(model.inputNames())
.map(values::get)
.toArray();
Object[] result = model.predict(valuesMap);
return (Double) result[0];
}
public static void main(String[] args) {
Main main = new Main();
Map<String, Double> values = Map.of(
"age", 20d,
"sex", 1d,
"bmi", -100d,
"bp", -200d,
"s1", 1d,
"s2", 2d,
"s3", 3d,
"s4", 4d,
"s5", 5d,
"s6", 6d
);
double predicted = main.getRegressionValue(values);
System.out.println(predicted);
}
}
For this java code, the main method that we need is the predict, and you can simply pass an array to it. Simplify the code as many as you wish, but just pay attention to the order that you add these features to this array, and this is why we used a Map. The inputNames method shows you exactly the features input order that you need to follow.
Testing that it works
For testing purposes, I exported the same test data that I used in the Python code, and imported in the Java code. There are a variety of metrics that we can use to mesure our machine learning models, and here, MAE (Mean Absolute Error) it’s used because it’s very simple and intuitive.
Looking at the Figure above, we can see that the MAE is the same in both of the codes. That means we’re predicting the same values, and that’s just an idea to check that you exported your model correctly. You can also use any other metric to ensure that.
The codes used in this test are available at:
- Python: https://colab.research.google.com/drive/1OFdBfKQrqRpRb0jWZtq7kgv7M3JRfs-o?usp=sharing#scrollTo=DuJq1np-ln1q
- Java: https://replit.com/@JhonatanSilva3/MachineLearningPipeline#Main.java
Hope this helps you somehow (: