148,634 questions
Score of 1
2 answers
121 views
df.explode() has no effect when used in a for-loop
I want to call multiple pandas-functions on multiple DataFrames with no avail. Unfortunately I do not understand why since I think it should work as pandas is referencing the DataFrames.
Below is s ...
Score of 2
1 answer
121 views
Unexpected duplicates after pd.pivot_table [closed]
I have a data set that looks like this:
>>> df_anon.head()
a b c d
0 1 30 1 929.3453
1 1 30 3 875.3986
2 1 30 5 849.9972
3 1 51 1 571.8364
4 1 51 2 ...
Best practices
0
votes
6
replies
107
views
How to detect a specific sequence of string values within a Dataframe column
I am performing analysis of some log files with Python/Pandas, and I am trying to develop an efficient operation to find a specific sequence of string values within a column. My current idea was to ...
Score of 0
1 answer
112 views
Assign category in new column based on multiple parameters [duplicate]
Using the palmer penguins dataset and the test data df_query below, I need to write a function or set of commands to add a new column that assigned each penguin the CATEGORY of small, medium or large.
...
Score of 1
0 answers
105 views
Formatting data matrix for Jaccard Similarity
I'm trying to do a Jaccard Similarity on my presence/absence data in RStudio, but I get this error.
jaccard_dist <- vegdist(dat2, method = "jaccard", binary = TRUE)
Error in vegdist(dat2, ...
Score of -8
2 answers
172 views
python dataframe rolling by date to concatenate a string
In python dataframe rolling by date to get sum and concatinate a string Furas demonstrated how to concatenate a string, transaction id, from a rolling group by. This does not solve the problem. I ...
Advice
1
vote
6
replies
137
views
Replacing xls file input with txt input
I have some code suitable for .xls file handling. But the input files are not consistent.
If I use .txt files as input the problem maybe solved. I need some sample code for the same functionality.
...
Score of -1
1 answer
165 views
python dataframe rolling by date to get sum and concatinate a string
I need to sum up transaction amounts over a rolling period and concat the transaction ids that make up the total.
Full code is at the bottom.
The below statements return the expected results.
...
Score of 0
1 answer
124 views
DolphinDB Python API: __DolphinDB_Type__ triggers Pandas UserWarning
I'm using the DolphinDB Python API to upload a pandas DataFrame and want to control the DolphinDB column types, for example trade_time as DATETIME instead of the default STRING.
Here's what I'm doing:
...
Best practices
1
vote
5
replies
121
views
How to avoid iterrows() for string concatenation and One-Hot Encoding in Pandas?
I am a university freshman learning AI, and we are currently working on a Kaggle dataset. I need to concatenate two string columns (ColA and ColB) and then convert the result into One-Hot Encoding.
...
Advice
1
vote
2
replies
63
views
How do I look into https://raw.githubusercontent.com/python-visualization/folium/master/examples/data to see available data?
I am learning how to create maps using python and a lot of the examples I learn from use
https://raw.githubusercontent.com/python-visualization/folium/master/examples/data
as an example dataset. ...
Score of -5
1 answer
186 views
python dataframe rolling by date to get sum
I need to accumulate the sum of transaction amount looking back a set number of days. 10 days would be a start.
I have petty cash transaction for a couple of people and I want to sum their spending ...
Score of 3
2 answers
229 views
How to deduplicate (based on two identical columns) and merge the remaining columns into a single row for a large dataframe?
Example data:
ID<-c("A","A","A","A","A","A","B","B","B")
HFAdmission<-c("2020-01-01", "...
Score of 3
2 answers
144 views
Why is the second "over" needed?
Taking the data from this question
add a new column based on a group without grouping
df = pl.DataFrame({
'year': [ 5, 5, 5,
10, 10,
15, 15,
30, 30, 30 ],
...
Score of -2
1 answer
125 views
Optimize a Python Polars function to avoid counting IDs or stacking empty dataframes
I have a common pattern in my workflows where I have a 'primary' dataframe which I may need to subset to a portion of rows, update values and potentially add new columns, and then merge those subset ...