Skip to main content

Questions tagged [machine-learning]

Machine learning algorithms build a model of the training data. The term "machine learning" is vaguely defined; it includes what is also called statistical learning, reinforcement learning, unsupervised learning, etc. ALWAYS ADD A MORE SPECIFIC TAG.

0 votes
0 answers
6 views

Popular research topics of the future [closed]

General question and definitely not a definitive answer available, but I am interested in what you think popular research directions of statistics/data science will be in the future? Currently (...
Red's user avatar
  • 367
1 vote
0 answers
10 views

When testing a specific hypothesis regarding HTE with "best_linear_projection" in a Causal Forest, is it valid to halve the p-value?

I’m using the "grf" package in R and its "best_linear_projection" function, which regresses doubly robust (AIPW) scores on a set of covariates/features. I have a directional ...
Jo99's user avatar
  • 11
0 votes
0 answers
25 views

What do I need to determine what the best model is to interpolate missing time series data?

I have a numerical time series with missing data points (rows). Someone said use a spline here, but that is old. I want a modern approach. AI recommends XGBRegressor or RandomForestRegressor. How do I ...
Michael's user avatar
  • 11
0 votes
0 answers
14 views

Is it a bad idea to use Transformer models on long-tailed datasets?

I’m working on a video classification task with a long-tailed dataset where a few classes have many samples while most classes have very few. More specifically, my dataset has around 9k samples and 3....
Olivia's user avatar
  • 191
4 votes
2 answers
293 views

Is it okay in prediction problems to put post-outcome features in the model?

I am relatively new to machine learning. I see many examples of practices where people include variables that are only available after the outcome variable (Y) to make predictions. An example of this ...
Abdullah Abdelaziz's user avatar
1 vote
0 answers
20 views

Need advice on length of context for future prediction

I'm using a trained foundation model to forecast values on a time series. The model works by taking a window of recent data (context) to predict near-future outcomes (horizon). How can I know the ...
Michael's user avatar
  • 11
1 vote
0 answers
28 views

In ablation with a fixed algorithm and fixed hyperparameters, can the expected test risk increase when adding strictly informative features?

Goal (decision-theoretic) I want to know whether there exist conditions under which the EXPECTED test risk strictly increases when I add information to the input, while keeping the learning rule fixed....
Jacopo Mancini's user avatar
0 votes
0 answers
64 views

Why is cross-validation better than data-splitting for small datasets? [closed]

In model building approaches, it is common practice that the entire data split into training and testing sets, and then, use the training set for building the model. However, I favor K cross-...
Rahul's user avatar
  • 133
0 votes
0 answers
19 views

Non-linear regression for modeling accuracy of ML models

Suppose I have a slow model with accuracy of between 75 and 80 %. I want to approximate this model with faster models. Fast models require $e$ effort and the more effort the better. I want to estimate ...
Gaslight Deceive Subvert's user avatar
2 votes
0 answers
20 views

Mixed-effects random forest regression conditional variable permutation importance software implementations

Is there any existing open source software implementation of mixed effects random forest regression (for clustered data) that employs conditional inference decision trees as base learners, and enables ...
Mike's user avatar
  • 21
1 vote
2 answers
63 views

How can Kernel Density Estimation learn multiple classes?

So I've stumbled upon this example in the Sklearn website, where a KDE instance is trained with handwritten digits, and then used to synthesize samples : https://scikit-learn.org/stable/auto_examples/...
Polyval4's user avatar
0 votes
0 answers
29 views

Transformation for clinical data with skewed distribution and a lot of zeros

I’m building a machine learning model using medical data. The features include clinical measurements (e.g., hemoglobin level in blood). The lab confirmed that some of these values can actually be ...
Marco Simoni's user avatar
5 votes
1 answer
97 views

Is there a "better" approach when it comes to model evaluation on multiple test datasets?

I have two models trained and validated on the same training/validation data. Now I need to evaluate them on multiple independent test datasets (e.g., 10 different datasets of the same measure). Which ...
user26416177's user avatar
-1 votes
0 answers
52 views

Does this method of unsupervised learning had already been tested? [closed]

So let's consider data is $n$ parameters and we consider a layer of also $n$ parameters we note $f$ the function $[0,1]^n \rightarrow [0,1]^n$ from first to second layer that we will consider ...
user avatar
0 votes
0 answers
49 views

How to apply Naive Bayes classifer when classes have different binary feature subsets?

I have a large number of classes $\mathcal{C} = \{c_1, c_2, \dots, c_k\}$, where each class $c$ contains an arbitrarily sized subset of features drawn from the full space of binary features $\mathbf{X}...
Special Sauce's user avatar

15 30 50 per page
1
2 3 4 5
1363