Newest 'feature-engineering' Questions

Advice

0 votes

1 replies

68 views

How to get good at feature engineering in building machine learning projects

I have been trying to tackle different projects and to implement different machine learning architectures. The first roadblock that I face is how to write the code for implementing feature engineering ...

Revanth Kovvuri

1

asked Apr 16 at 18:21

Advice

0 votes

1 replies

61 views

Strategy for Outlier Removal in Skewed Supply Chain Data

I am working on a research project involving Supply Chain Forecast Matching, and I am stuck on the best strategy for handling outliers. I am seeking advice from a feature engineering perspective. 1. ...

Paavan Shah

1

asked Dec 30, 2025 at 6:09

0 votes

0 answers

98 views

In Algotrading, How to Incrementally Calculate Features for New Live Candles, Ensuring Full-Backtest Consistency (Pandas/TA/ML)

I'm developing a live trading bot in Python that fetches OHLCV data (e.g., 15m candles) and computes a large number of features—rolling indicators (VWAP/Volume-ADI,SMA/EMA/ATR/RSI), price action, ...

bolt investor

1

asked Jun 28, 2025 at 5:39

1 vote

0 answers

1k views

How do I resolve a CONFIG_NOT_AVAILABLE error in Databricks

I am a Databricks newbie attempting to reproduce the actions in a training video. I have successfully uploaded a *.csv file, surgeries2.csv, using spark.read.csv, which displays properly. My next step ...

Barry King

11

asked Mar 7, 2025 at 11:56

3 votes

0 answers

42 views

Unable to fit MeanEncoder. error 'Index' object has no attribute 'infer_objects'

I am trying to run this code in Jupyter Notebook import pandas as pd from feature_engine.encoding import MeanEncoder X = pd.DataFrame(dict(x1 = [1,2,3,4,5], x2 = ["c", "c", "...

Bruno SALANON

31

asked Nov 28, 2024 at 6:34

-1 votes

1 answer

67 views

How to Standardize Features and Relationship Weights for GraphSAGE in Neo4j?

Absolutely! Here's how you could phrase your question for Stack Overflow to get help regarding feature standardization for using GraphSAGE in Neo4j: Title: How to Standardize Features and ...

adts

1

asked Sep 9, 2024 at 9:12

0 votes

1 answer

57 views

Pandas Online Usage Performance Issues

During the training phase, we use DataFrame for data processing. To ensure that the feature processing functions are consistent between online prediction and training, we also convert data to ...

zhcn

21

asked Aug 19, 2024 at 7:32

0 votes

0 answers

63 views

Create a Machine Learning Numpy array with scalar features and an N-dimensional Coordinate Vector feature

I'm trying to format my data for a ML program. There are 33,000 events and each event has 3 things I want to consider: Mass, Energy, a coordinate. The Mass is of the shape (33000,) and looks like: [...

Liam B

1

asked Jul 23, 2024 at 21:28

0 votes

1 answer

55 views

Create time based features in Pyspark

I have a feature table Pyspark DF that gets created every day through a pipeline. Now the ask is to create time based features for each feature where each t-1 till t-30 (t=time) features captures the ...

Neethu Paul

1

asked Jun 12, 2024 at 13:39

0 votes

0 answers

98 views

XGBoost Time Series Diff Feature

Let's suppose I have day 1,2,3,4 and want to predict day 5: Features = Weekday, Diff Target = Value Weekday: 1,2,3,4,5 | Diff: NaN,40,20,20,(?) | Value: 20,60,80,100,(?) When I train my model using ...

Vitor Xavier

1

asked May 16, 2024 at 19:28

0 votes

1 answer

92 views

How to force a model to use a variable [closed]

I have data for train a binary classification model. set.seed(1) n <- 20 dat <- cbind.data.frame(target=as.factor(sample(0:1,n,T)), price=round(rnorm(n)+1000,2), ...

mr.T

634

asked May 8, 2024 at 17:26

-1 votes

1 answer

36 views

One Hot Encoding with large dimensions [closed]

I am building a sales prediction model which consists of "Year", "Month", "Economy Indicator", "Customer_Id", "Product_Id", "Quantity", &...

ProfessorE

47

asked Apr 30, 2024 at 15:23

0 votes

0 answers

66 views

Removing the Moving Component/Slope from a graph

I have a graph which slowly increases and there's a peak in the graph. This graph is derived from a numpy array. How can I make it a flat line like a ECG, with just the major peak? I am thinking of ...

Adarsh

9

asked Apr 26, 2024 at 13:14

-1 votes

1 answer

475 views

Time Series Rolling Windows Feature [closed]

If I'm creating a Rolling Mean Feature based on my Sales (target) column, is it necessary to shift it? Let me give an example: Lets suppose I have days 01~10 in my dataset. If I create a Mean Rolling ...

Vitor Xavier

27

asked Mar 22, 2024 at 5:35

0 votes

1 answer

195 views

What is a vectorized way to detect feature drift in pandas columns?

I'm working on very large pandas dataframes that hold time series with significant feature drift. The drift is often sudden (e.g., the features would be 1.5-2.0x times larger than a few periods ...

KingOtto

1,710

asked Mar 11, 2024 at 16:14

Collectives™ on Stack Overflow

How to get good at feature engineering in building machine learning projects

Strategy for Outlier Removal in Skewed Supply Chain Data

In Algotrading, How to Incrementally Calculate Features for New Live Candles, Ensuring Full-Backtest Consistency (Pandas/TA/ML)

How do I resolve a CONFIG_NOT_AVAILABLE error in Databricks

Unable to fit MeanEncoder. error 'Index' object has no attribute 'infer_objects'

How to Standardize Features and Relationship Weights for GraphSAGE in Neo4j?

Pandas Online Usage Performance Issues

Create a Machine Learning Numpy array with scalar features and an N-dimensional Coordinate Vector feature

Create time based features in Pyspark

XGBoost Time Series Diff Feature

How to force a model to use a variable [closed]

One Hot Encoding with large dimensions [closed]

Removing the Moving Component/Slope from a graph

Time Series Rolling Windows Feature [closed]

What is a vectorized way to detect feature drift in pandas columns?

Hot Network Questions