Changed prediction to run with multithreading#54
Changed prediction to run with multithreading#54skjerns wants to merge 6 commits intonok:stablefrom skjerns:patch-1
Conversation
|
Hello @skjerns , this is great! I will merge your PR and adapt it to the new major release. Until it's done I will keep this PR open. Best, Darius |
|
Meanwhile I have found another solution that speeds up things to almost real-time predictions: I altered the example for C: int main(int argc, const char * argv[]) {
if ((argc-1) % n_features != 0){
printf("Need to supply N x %d features flattened, %d were given", n_features, argc-1);
return 1;
}
double features[n_features];
int n_rows = (argc-1) / n_features;
for (int row=0; row < n_rows; row++){
printf("row: %d\\n", row);
for (int i = 0; i < n_features; i++) {
features[i] = atof(argv[i+row*n_features+1]);
}
// calculate outputs for debugging
int class_idx = predict_class_idx(features);
// same as calling label = predict(features)
int label = labels[class_idx];
// now we print the results
printf("labels: ");
for (int i=0; i<n_classes; i++){
printf("%d ", labels[i]);
}
printf("\\n");
printf("class_idx: %d\\n", class_idx);
printf("label: %d", label);
printf("\\n\\n");
}
return 0;} |
|
In the next release all internal predictions will be multiprocessed by default. Here is the relevant part:
Yes, SIMD operations would be nice. But for now I prefer a simple and intuitive starting point where a developer can change and extend the generated source code easily. Nevertheless I see and understand the need, so I would suggest that we create an additional interactive example (something like that) where we demonstrate the customization and the final benefit. The current scaffold of a template is here. What do you think? |
Thanks for the note! That sounds great. I removed all checks that are related to the operating system: |
great! might be handy to include a edit: Ah, I guess that's done by DEPENDENCIES |
Great! Nice.
I'll leave it up to you. Having the source code of individual language templates would be feasible I guess? |
I saw that the
predictorintegrity_scoreis running quite slow.I've added functionality to let it run with
threading, making it much faster.It adds a dependency on
joblib, however this is already a dependency ofsklearn, so no new dependencies are really added. This makes the code ~8x faster (with 8 threads).I've changed the call from
Shell.check_outputtosubprocess.check_output.Shellis callingsubprocess.check_outputin the background anyway, but like this we get another speedup of ~3-4xso a total speedup of ~30x is possible.
Example:
I've also seen that
integrity_scoreruns perfectly fine on Windows, given thatgccis installed (and the hard-coded blocking of windows is removed). Do you think we can remove the blocking of the function for windows platforms?