455 questions
Advice
0
votes
1
replies
68
views
How to get good at feature engineering in building machine learning projects
I have been trying to tackle different projects and to implement different machine learning architectures. The first roadblock that I face is how to write the code for implementing feature engineering ...
Advice
0
votes
1
replies
61
views
Strategy for Outlier Removal in Skewed Supply Chain Data
I am working on a research project involving Supply Chain Forecast Matching, and I am stuck on the best strategy for handling outliers. I am seeking advice from a feature engineering perspective.
1. ...
0
votes
0
answers
98
views
In Algotrading, How to Incrementally Calculate Features for New Live Candles, Ensuring Full-Backtest Consistency (Pandas/TA/ML)
I'm developing a live trading bot in Python that fetches OHLCV data (e.g., 15m candles) and computes a large number of features—rolling indicators (VWAP/Volume-ADI,SMA/EMA/ATR/RSI), price action, ...
1
vote
0
answers
1k
views
How do I resolve a CONFIG_NOT_AVAILABLE error in Databricks
I am a Databricks newbie attempting to reproduce the actions in a training video. I have successfully uploaded a *.csv file, surgeries2.csv, using spark.read.csv, which displays properly.
My next step ...
3
votes
0
answers
42
views
Unable to fit MeanEncoder. error 'Index' object has no attribute 'infer_objects'
I am trying to run this code in Jupyter Notebook
import pandas as pd
from feature_engine.encoding import MeanEncoder
X = pd.DataFrame(dict(x1 = [1,2,3,4,5], x2 = ["c", "c", "...
-1
votes
1
answer
67
views
How to Standardize Features and Relationship Weights for GraphSAGE in Neo4j?
Absolutely! Here's how you could phrase your question for Stack Overflow to get help regarding feature standardization for using GraphSAGE in Neo4j:
Title: How to Standardize Features and ...
0
votes
1
answer
57
views
Pandas Online Usage Performance Issues
During the training phase, we use DataFrame for data processing. To ensure that the feature processing functions are consistent between online prediction and training, we also convert data to ...
0
votes
0
answers
63
views
Create a Machine Learning Numpy array with scalar features and an N-dimensional Coordinate Vector feature
I'm trying to format my data for a ML program. There are 33,000 events and each event has 3 things I want to consider: Mass, Energy, a coordinate.
The Mass is of the shape (33000,) and looks like: [...
0
votes
1
answer
55
views
Create time based features in Pyspark
I have a feature table Pyspark DF that gets created every day through a pipeline. Now the ask is to create time based features for each feature where each t-1 till t-30 (t=time) features captures the ...
0
votes
0
answers
98
views
XGBoost Time Series Diff Feature
Let's suppose I have day 1,2,3,4 and want to predict day 5:
Features = Weekday, Diff
Target = Value
Weekday: 1,2,3,4,5 | Diff: NaN,40,20,20,(?) | Value: 20,60,80,100,(?)
When I train my model using ...
0
votes
1
answer
92
views
How to force a model to use a variable [closed]
I have data for train a binary classification model.
set.seed(1)
n <- 20
dat <- cbind.data.frame(target=as.factor(sample(0:1,n,T)),
price=round(rnorm(n)+1000,2),
...
-1
votes
1
answer
36
views
One Hot Encoding with large dimensions [closed]
I am building a sales prediction model which consists of "Year", "Month", "Economy Indicator", "Customer_Id", "Product_Id", "Quantity", &...
0
votes
0
answers
66
views
Removing the Moving Component/Slope from a graph
I have a graph which slowly increases and there's a peak in the graph. This graph is derived from a numpy array. How can I make it a flat line like a ECG, with just the major peak?
I am thinking of ...
-1
votes
1
answer
475
views
Time Series Rolling Windows Feature [closed]
If I'm creating a Rolling Mean Feature based on my Sales (target) column, is it necessary to shift it?
Let me give an example:
Lets suppose I have days 01~10 in my dataset. If I create a Mean Rolling ...
0
votes
1
answer
195
views
What is a vectorized way to detect feature drift in pandas columns?
I'm working on very large pandas dataframes that hold time series with significant feature drift. The drift is often sudden (e.g., the features would be 1.5-2.0x times larger than a few periods ...