dennisbakhuis
diff --git a/‎B_Pandas_tips/7 - Give aggregation a name.ipynb‎
Lines changed: 137 additions & 0 deletions b/‎B_Pandas_tips/7 - Give aggregation a name.ipynb‎
Lines changed: 137 additions & 0 deletions
diff --git a/‎B_Pandas_tips/Assets/6 - Selecting a range.png‎
239 KB b/‎B_Pandas_tips/Assets/6 - Selecting a range.png‎
239 KB
diff --git a/‎B_Pandas_tips/Assets/7 - Give aggregation a name.png‎
200 KB b/‎B_Pandas_tips/Assets/7 - Give aggregation a name.png‎
200 KB
diff --git a/‎B_Pandas_tips/Assets/Pandas_Tricks_Slides.pptx‎
117 KB b/‎B_Pandas_tips/Assets/Pandas_Tricks_Slides.pptx‎
117 KB
@@ -0,0 +1,137 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Pandas tip #7: Give aggregation a name\n",
+    "Not sure if it are just my OCDs but my mind has difficulties working with wrong names. When doing an aggregate using a .groupby() in Pandas, it generally keeps the original column name. However, when your column is called cost and your aggregate is .count() the final name is not correct.\n",
+    "\n",
+    "There are many ways to change te name with the most obvious choice the .rename() method. While it works, it always felt a bit clunky. A much neater way is using the .agg() function. This method can do many agregations and has a similar syntax as .assign() that assigns the result to a particular column name. I also think it is more clear than the combined .rename()."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Lets generate some random data:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "\n",
+    "categories = list('ABCD')\n",
+    "n_samples = 10_000\n",
+    "\n",
+    "rng = np.random.default_rng()\n",
+    "df = pd.DataFrame({\n",
+    "    'category': rng.choice(categories, size=n_samples),\n",
+    "    'cost': rng.integers(1,100,size=n_samples),    \n",
+    "})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "(df\n",
+    "    .groupby('category')\n",
+    "    .count()\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Most easy way is to rename your columns afterwards:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "(df\n",
+    "    .groupby('category')\n",
+    "    .count()\n",
+    "    .rename(columns={'cost': 'count_cost'})\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `.agg()` has a `.assign()` like pattern which combines the rename with the aggregate:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# https://linkedin.com/in/dennisbakhuis\n",
+    "(df\n",
+    "    .groupby('category')\n",
+    "    .agg(\n",
+    "        count = ('cost', 'count'),\n",
+    "        sum_cost = ('cost', 'sum')\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you have any questions, comments, or requests, feel free to [contact me on LinkedIn](https://linkedin.com/in/dennisbakhuis)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}