Skip to main content
Advice
0 votes
1 replies
68 views

I have been trying to tackle different projects and to implement different machine learning architectures. The first roadblock that I face is how to write the code for implementing feature engineering ...
Revanth Kovvuri's user avatar
Advice
0 votes
1 replies
61 views

I am working on a research project involving Supply Chain Forecast Matching, and I am stuck on the best strategy for handling outliers. I am seeking advice from a feature engineering perspective. 1. ...
Paavan Shah's user avatar
0 votes
0 answers
98 views

I'm developing a live trading bot in Python that fetches OHLCV data (e.g., 15m candles) and computes a large number of features—rolling indicators (VWAP/Volume-ADI,SMA/EMA/ATR/RSI), price action, ...
bolt investor's user avatar
1 vote
0 answers
1k views

I am a Databricks newbie attempting to reproduce the actions in a training video. I have successfully uploaded a *.csv file, surgeries2.csv, using spark.read.csv, which displays properly. My next step ...
Barry King's user avatar
3 votes
0 answers
42 views

I am trying to run this code in Jupyter Notebook import pandas as pd from feature_engine.encoding import MeanEncoder X = pd.DataFrame(dict(x1 = [1,2,3,4,5], x2 = ["c", "c", "...
Bruno SALANON's user avatar
-1 votes
1 answer
67 views

Absolutely! Here's how you could phrase your question for Stack Overflow to get help regarding feature standardization for using GraphSAGE in Neo4j: Title: How to Standardize Features and ...
adts's user avatar
  • 1
0 votes
1 answer
57 views

During the training phase, we use DataFrame for data processing. To ensure that the feature processing functions are consistent between online prediction and training, we also convert data to ...
zhcn's user avatar
  • 21
0 votes
0 answers
63 views

I'm trying to format my data for a ML program. There are 33,000 events and each event has 3 things I want to consider: Mass, Energy, a coordinate. The Mass is of the shape (33000,) and looks like: [...
Liam B's user avatar
  • 1
0 votes
1 answer
55 views

I have a feature table Pyspark DF that gets created every day through a pipeline. Now the ask is to create time based features for each feature where each t-1 till t-30 (t=time) features captures the ...
Neethu Paul's user avatar
0 votes
0 answers
98 views

Let's suppose I have day 1,2,3,4 and want to predict day 5: Features = Weekday, Diff Target = Value Weekday: 1,2,3,4,5 | Diff: NaN,40,20,20,(?) | Value: 20,60,80,100,(?) When I train my model using ...
Vitor Xavier's user avatar
0 votes
1 answer
92 views

I have data for train a binary classification model. set.seed(1) n <- 20 dat <- cbind.data.frame(target=as.factor(sample(0:1,n,T)), price=round(rnorm(n)+1000,2), ...
mr.T's user avatar
  • 634
-1 votes
1 answer
36 views

I am building a sales prediction model which consists of "Year", "Month", "Economy Indicator", "Customer_Id", "Product_Id", "Quantity", &...
ProfessorE's user avatar
0 votes
0 answers
66 views

I have a graph which slowly increases and there's a peak in the graph. This graph is derived from a numpy array. How can I make it a flat line like a ECG, with just the major peak? I am thinking of ...
Adarsh's user avatar
  • 9
-1 votes
1 answer
475 views

If I'm creating a Rolling Mean Feature based on my Sales (target) column, is it necessary to shift it? Let me give an example: Lets suppose I have days 01~10 in my dataset. If I create a Mean Rolling ...
Vitor Xavier's user avatar
0 votes
1 answer
195 views

I'm working on very large pandas dataframes that hold time series with significant feature drift. The drift is often sudden (e.g., the features would be 1.5-2.0x times larger than a few periods ...
KingOtto's user avatar
  • 1,710

15 30 50 per page
1
2 3 4 5
31