28,190 questions
0
votes
1
answer
29
views
Visualizing a categorical column in a python scatterplot while clustering [closed]
I'm running HDBSCAN on a massive dataset of geospatial data on rodent inspection sites in New York City (from the NYC open data site). In addition to running the algorithm for the latitude/longitude ...
2
votes
1
answer
110
views
How to identify users' profiles based on the output uplift values
I am currently learning the Causal Forest algorithm in Python. In an exercise, I need to evaluate a marketing campaign where a certain group of users have already received coupons. Given that Y is the ...
Advice
0
votes
5
replies
76
views
Numpy axis rules
I am a Python developer, but I don't understand one thing: what are the numpy axis? Sometimes, when I use Sklearn, I have errors about axis. And I need explanations about values and reshape functions.
1
vote
0
answers
45
views
sklearn's FactorAnalysis varimax orthogonal rotation increases correlation of factors
I'm using Scikit-Learn's FactorAnalysis in an application that relies on the assumption that the factors are uncorrelated. It would be great to have more interpretable factors, and an orthogonal ...
Best practices
0
votes
0
replies
72
views
best way to leverage polars multithreading with scikit-learn compatibility
I've been working on a project for rapidly testing thousands of outcome variables on a standard set of predictors and covariates using polars. It's working very well, with speed ups as high as 16x ...
1
vote
3
answers
111
views
What does PoissonRegression.predict() actually return in sklearn?
What is being returned by PoissonRegression.predict() in sklearn when I am predicting target values from data? Is it the actual predicted value of the target?
1
vote
0
answers
40
views
AdaBoost performance degrades when exported to ONNX
My AdaBoost model uses SAMME.R and a Decision Tree as base learner to perform binary classification, the preprocessing is done using a MinMaxScaler. After converting to ONNX and running inference ...
-2
votes
2
answers
60
views
Python ValueError while training Logistic Regression model [duplicate]
I am trying to train a Logistic Regression model using scikit-learn in Python.
When I try to fit the model, I get the following error:
ValueError: could not convert string to float
Here is the code I ...
Tooling
0
votes
0
replies
67
views
Good packages for bounded Linear Quantile Regression?
I'm looking for a good package to train a linear quantile regression model, i.e. $\hat y = \sum_{i=1}^n w_i \cdot X_i$. With $x_i$ are the input features, and $w_i$ are the bounded trainable weights. ...
0
votes
1
answer
40
views
Sklearn2pmml raises an error on 'classes_' parameter
I'm trying to create a PMML from a model, using this way :
from sklearn.preprocessing import LabelEncoder
y_h_train = LabelEncoder().fit_transform(y_train.copy(deep=True))
modele_label_encoded = ...
0
votes
1
answer
66
views
Logging SVC/SVM training to log file
I am trying to save the output from sklearn.smv.SVC training when verbose=True to a log-file. However, since it uses LibSVM in the back-end, I cannot figure out how this works. Copilot hasn't helped.
...
Advice
1
vote
2
replies
134
views
Machine Learning Project using Multidimensional Array Input/Outputs
I am struggling to get my ML model to accept the input and outputs that I need.
My aim is to have it accept this as the input:
input_x = [
((4.11, 8.58, -2.2), (-1.27, -8.76, 2.23)),
((0.43, -...
4
votes
0
answers
224
views
MLflow doesn’t log or show model artifacts after training run
I’m working on a machine learning project using MLflow for experiment tracking (on macOS, Python 3.12, scikit-learn, and DagsHub as the tracking server). The experiment runs successfully — I see the ...
0
votes
0
answers
98
views
How to use sklearn imputation methods on numpy.void (record or structured array, I'm not sure) ndarray
Code:
import numpy as np
import sklearn as skl
data = np.genfromtxt("water_potability.csv", delimiter = ",", names = True)
print(data)
print(data.shape)
print(type(data[0]))
...
0
votes
0
answers
126
views
Shape of tree_.value
According to the sklearn docs the shape of tree_.value is [n_nodes, n_classes, n_outputs]. I just wanted to ask if this is still correct.
I think the correct shape is [n_nodes, n_outputs, n_classes] ...