Skip to content

Commit 9a22eb6

Browse files
author
Dennis Bakhuis
committed
tip 7
1 parent aac0ae5 commit 9a22eb6

4 files changed

Lines changed: 137 additions & 0 deletions

File tree

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Pandas tip #7: Give aggregation a name\n",
8+
"Not sure if it are just my OCDs but my mind has difficulties working with wrong names. When doing an aggregate using a .groupby() in Pandas, it generally keeps the original column name. However, when your column is called cost and your aggregate is .count() the final name is not correct.\n",
9+
"\n",
10+
"There are many ways to change te name with the most obvious choice the .rename() method. While it works, it always felt a bit clunky. A much neater way is using the .agg() function. This method can do many agregations and has a similar syntax as .assign() that assigns the result to a particular column name. I also think it is more clear than the combined .rename()."
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"metadata": {},
16+
"source": [
17+
"Lets generate some random data:"
18+
]
19+
},
20+
{
21+
"cell_type": "code",
22+
"execution_count": null,
23+
"metadata": {},
24+
"outputs": [],
25+
"source": [
26+
"import numpy as np\n",
27+
"import pandas as pd\n",
28+
"\n",
29+
"categories = list('ABCD')\n",
30+
"n_samples = 10_000\n",
31+
"\n",
32+
"rng = np.random.default_rng()\n",
33+
"df = pd.DataFrame({\n",
34+
" 'category': rng.choice(categories, size=n_samples),\n",
35+
" 'cost': rng.integers(1,100,size=n_samples), \n",
36+
"})"
37+
]
38+
},
39+
{
40+
"cell_type": "code",
41+
"execution_count": null,
42+
"metadata": {},
43+
"outputs": [],
44+
"source": [
45+
"(df\n",
46+
" .groupby('category')\n",
47+
" .count()\n",
48+
")"
49+
]
50+
},
51+
{
52+
"cell_type": "markdown",
53+
"metadata": {},
54+
"source": [
55+
"Most easy way is to rename your columns afterwards:"
56+
]
57+
},
58+
{
59+
"cell_type": "code",
60+
"execution_count": null,
61+
"metadata": {},
62+
"outputs": [],
63+
"source": [
64+
"(df\n",
65+
" .groupby('category')\n",
66+
" .count()\n",
67+
" .rename(columns={'cost': 'count_cost'})\n",
68+
")"
69+
]
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"metadata": {},
74+
"source": [
75+
"The `.agg()` has a `.assign()` like pattern which combines the rename with the aggregate:"
76+
]
77+
},
78+
{
79+
"cell_type": "code",
80+
"execution_count": null,
81+
"metadata": {},
82+
"outputs": [],
83+
"source": [
84+
"# https://linkedin.com/in/dennisbakhuis\n",
85+
"(df\n",
86+
" .groupby('category')\n",
87+
" .agg(\n",
88+
" count = ('cost', 'count'),\n",
89+
" sum_cost = ('cost', 'sum')\n",
90+
" )\n",
91+
")"
92+
]
93+
},
94+
{
95+
"cell_type": "markdown",
96+
"metadata": {},
97+
"source": [
98+
"If you have any questions, comments, or requests, feel free to [contact me on LinkedIn](https://linkedin.com/in/dennisbakhuis)."
99+
]
100+
},
101+
{
102+
"cell_type": "code",
103+
"execution_count": null,
104+
"metadata": {},
105+
"outputs": [],
106+
"source": []
107+
},
108+
{
109+
"cell_type": "code",
110+
"execution_count": null,
111+
"metadata": {},
112+
"outputs": [],
113+
"source": []
114+
}
115+
],
116+
"metadata": {
117+
"kernelspec": {
118+
"display_name": "Python 3",
119+
"language": "python",
120+
"name": "python3"
121+
},
122+
"language_info": {
123+
"codemirror_mode": {
124+
"name": "ipython",
125+
"version": 3
126+
},
127+
"file_extension": ".py",
128+
"mimetype": "text/x-python",
129+
"name": "python",
130+
"nbconvert_exporter": "python",
131+
"pygments_lexer": "ipython3",
132+
"version": "3.7.7"
133+
}
134+
},
135+
"nbformat": 4,
136+
"nbformat_minor": 4
137+
}
239 KB
Loading
200 KB
Loading
117 KB
Binary file not shown.

0 commit comments

Comments
 (0)