|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Pandas tip #8: Explode your DataFrame\n", |
| 8 | + "A column in a Pandas DataFrame can practically hold any type. Not all types are ideal but some can be useful as an intermediate value. One of these types is a list which creates some sort of higher order structure in your tabular data. \n", |
| 9 | + "\n", |
| 10 | + "For example, I used a list in a DataFrame of words, to store the position the word was mentioned in a large corpus. This nicely groups all relevant data, however, to be useful we need to flatten this again. Flattening in Python is nicely done using a list comprehension. In Pandas I found this nifty method: .explode().\n", |
| 11 | + "\n", |
| 12 | + "The .explode() method flattens the list from each row, independed of the length of each list and duplicates the rest of the rows columns. It is probably not used very often but definitely a 'nice to have' method in your toolbox." |
| 13 | + ] |
| 14 | + }, |
| 15 | + { |
| 16 | + "cell_type": "markdown", |
| 17 | + "metadata": {}, |
| 18 | + "source": [ |
| 19 | + "Lets generate some random data:" |
| 20 | + ] |
| 21 | + }, |
| 22 | + { |
| 23 | + "cell_type": "code", |
| 24 | + "execution_count": null, |
| 25 | + "metadata": {}, |
| 26 | + "outputs": [], |
| 27 | + "source": [ |
| 28 | + "import numpy as np\n", |
| 29 | + "import pandas as pd\n", |
| 30 | + "\n", |
| 31 | + "categories = [list('AB'), list('ABC'), list('ABCD')] \n", |
| 32 | + "n_samples = 100\n", |
| 33 | + "\n", |
| 34 | + "rng = np.random.default_rng()\n", |
| 35 | + "df = pd.DataFrame({\n", |
| 36 | + " 'client_id': np.arange(n_samples), \n", |
| 37 | + " 'product_category': rng.choice(categories, size=n_samples),\n", |
| 38 | + "}).set_index('client_id')" |
| 39 | + ] |
| 40 | + }, |
| 41 | + { |
| 42 | + "cell_type": "code", |
| 43 | + "execution_count": null, |
| 44 | + "metadata": {}, |
| 45 | + "outputs": [], |
| 46 | + "source": [ |
| 47 | + "df.head()" |
| 48 | + ] |
| 49 | + }, |
| 50 | + { |
| 51 | + "cell_type": "markdown", |
| 52 | + "metadata": {}, |
| 53 | + "source": [ |
| 54 | + "Flatten a list in Python is easy using list comprehensions:" |
| 55 | + ] |
| 56 | + }, |
| 57 | + { |
| 58 | + "cell_type": "code", |
| 59 | + "execution_count": null, |
| 60 | + "metadata": {}, |
| 61 | + "outputs": [], |
| 62 | + "source": [ |
| 63 | + "categories" |
| 64 | + ] |
| 65 | + }, |
| 66 | + { |
| 67 | + "cell_type": "code", |
| 68 | + "execution_count": null, |
| 69 | + "metadata": {}, |
| 70 | + "outputs": [], |
| 71 | + "source": [ |
| 72 | + "[item for sublist in categories for item in sublist]" |
| 73 | + ] |
| 74 | + }, |
| 75 | + { |
| 76 | + "cell_type": "markdown", |
| 77 | + "metadata": {}, |
| 78 | + "source": [ |
| 79 | + "Flattening a column is called exploding in Pandas:" |
| 80 | + ] |
| 81 | + }, |
| 82 | + { |
| 83 | + "cell_type": "code", |
| 84 | + "execution_count": null, |
| 85 | + "metadata": {}, |
| 86 | + "outputs": [], |
| 87 | + "source": [ |
| 88 | + "df.explode('product_category')" |
| 89 | + ] |
| 90 | + }, |
| 91 | + { |
| 92 | + "cell_type": "markdown", |
| 93 | + "metadata": {}, |
| 94 | + "source": [ |
| 95 | + "If you have any questions, comments, or requests, feel free to [contact me on LinkedIn](https://linkedin.com/in/dennisbakhuis)." |
| 96 | + ] |
| 97 | + }, |
| 98 | + { |
| 99 | + "cell_type": "code", |
| 100 | + "execution_count": null, |
| 101 | + "metadata": {}, |
| 102 | + "outputs": [], |
| 103 | + "source": [] |
| 104 | + }, |
| 105 | + { |
| 106 | + "cell_type": "code", |
| 107 | + "execution_count": null, |
| 108 | + "metadata": {}, |
| 109 | + "outputs": [], |
| 110 | + "source": [] |
| 111 | + } |
| 112 | + ], |
| 113 | + "metadata": { |
| 114 | + "kernelspec": { |
| 115 | + "display_name": "Python 3", |
| 116 | + "language": "python", |
| 117 | + "name": "python3" |
| 118 | + }, |
| 119 | + "language_info": { |
| 120 | + "codemirror_mode": { |
| 121 | + "name": "ipython", |
| 122 | + "version": 3 |
| 123 | + }, |
| 124 | + "file_extension": ".py", |
| 125 | + "mimetype": "text/x-python", |
| 126 | + "name": "python", |
| 127 | + "nbconvert_exporter": "python", |
| 128 | + "pygments_lexer": "ipython3", |
| 129 | + "version": "3.7.7" |
| 130 | + } |
| 131 | + }, |
| 132 | + "nbformat": 4, |
| 133 | + "nbformat_minor": 4 |
| 134 | +} |
0 commit comments