Questions tagged [machine-learning]
Machine learning algorithms build a model of the training data. The term "machine learning" is vaguely defined; it includes what is also called statistical learning, reinforcement learning, unsupervised learning, etc. ALWAYS ADD A MORE SPECIFIC TAG.
20,433 questions
0
votes
0
answers
6
views
Popular research topics of the future [closed]
General question and definitely not a definitive answer available, but I am interested in what you think popular research directions of statistics/data science will be in the future? Currently (...
1
vote
0
answers
10
views
When testing a specific hypothesis regarding HTE with "best_linear_projection" in a Causal Forest, is it valid to halve the p-value?
I’m using the "grf" package in R and its "best_linear_projection" function, which regresses doubly robust (AIPW) scores on a set of covariates/features. I have a directional ...
0
votes
0
answers
25
views
What do I need to determine what the best model is to interpolate missing time series data?
I have a numerical time series with missing data points (rows). Someone said use a spline here, but that is old. I want a modern approach. AI recommends XGBRegressor or RandomForestRegressor. How do I ...
0
votes
0
answers
14
views
Is it a bad idea to use Transformer models on long-tailed datasets?
I’m working on a video classification task with a long-tailed dataset where a few classes have many samples while most classes have very few.
More specifically, my dataset has around 9k samples and 3....
4
votes
2
answers
293
views
Is it okay in prediction problems to put post-outcome features in the model?
I am relatively new to machine learning. I see many examples of practices where people include variables that are only available after the outcome variable (Y) to make predictions.
An example of this ...
1
vote
0
answers
20
views
Need advice on length of context for future prediction
I'm using a trained foundation model to forecast values on a time series. The model works by taking a window of recent data (context) to predict near-future outcomes (horizon).
How can I know the ...
1
vote
0
answers
28
views
In ablation with a fixed algorithm and fixed hyperparameters, can the expected test risk increase when adding strictly informative features?
Goal (decision-theoretic)
I want to know whether there exist conditions under which the EXPECTED test risk strictly increases when I add information to the input, while keeping the learning rule fixed....
0
votes
0
answers
64
views
Why is cross-validation better than data-splitting for small datasets? [closed]
In model building approaches, it is common practice that the entire data split into training and testing sets, and then, use the training set for building the model. However, I favor K cross-...
0
votes
0
answers
19
views
Non-linear regression for modeling accuracy of ML models
Suppose I have a slow model with accuracy of between 75 and 80 %. I
want to approximate this model with faster models. Fast models require $e$ effort and the more effort the better. I want to estimate ...
2
votes
0
answers
20
views
Mixed-effects random forest regression conditional variable permutation importance software implementations
Is there any existing open source software implementation of mixed effects random forest regression (for clustered data) that employs conditional inference decision trees as base learners, and enables ...
1
vote
2
answers
63
views
How can Kernel Density Estimation learn multiple classes?
So I've stumbled upon this example in the Sklearn website, where a KDE instance is trained with handwritten digits, and then used to synthesize samples : https://scikit-learn.org/stable/auto_examples/...
0
votes
0
answers
29
views
Transformation for clinical data with skewed distribution and a lot of zeros
I’m building a machine learning model using medical data.
The features include clinical measurements (e.g., hemoglobin level in blood). The lab confirmed that some of these values can actually be ...
5
votes
1
answer
97
views
Is there a "better" approach when it comes to model evaluation on multiple test datasets?
I have two models trained and validated on the same training/validation data.
Now I need to evaluate them on multiple independent test datasets (e.g., 10 different datasets of the same measure).
Which ...
-1
votes
0
answers
52
views
Does this method of unsupervised learning had already been tested? [closed]
So let's consider data is $n$ parameters and we consider a layer of also $n$ parameters we note $f$ the function $[0,1]^n \rightarrow [0,1]^n$ from first to second layer that we will consider ...
0
votes
0
answers
49
views
How to apply Naive Bayes classifer when classes have different binary feature subsets?
I have a large number of classes $\mathcal{C} = \{c_1, c_2, \dots, c_k\}$, where each class $c$ contains an arbitrarily sized subset of features drawn from the full space of binary features $\mathbf{X}...