10 Python One-Liners for Statistical Plotting

10 Python One-Liners for Statistical Plotting
Image by Editor | ChatGPT

Statistical plotting is an important part of exploratory data analysis (EDA). Quickly and easily visualizing distributions, relationships, and anomalies can help you recognize patterns and insights in no time.

This tutorial will take you through 10 one-liners in Python for statistical plotting, using the Arabica coffee quality dataset. This dataset includes over 1,300 rows of sensory and chemical measurements from coffee samples around the world.

Let’s dive in!

Getting Started

We’ll be using pandas, matplotlib, and seaborn, which together offer a rich and intuitive plotting interface. To follow along, first load the dataset into your environment:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

url = 'https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/arabica_data_cleaned.csv'
df = pd.read_csv(url)

1. Histogram of Total Cup Points

Visualize the distribution of scores with a simple histogram.

df['Total.Cup.Points'].hist(bins=20)

Pro tip: Use seaborn.histplot() for kernel density overlays.

2. Boxplot by Country

Compare score distributions across countries.

sns.boxplot(data=df, x='Country.of.Origin', y='Total.Cup.Points')

Pro tip: Add order= to control country order or focus on top N.

3. KDE Plot of Acidity

Get a smooth estimate of acidity distribution.

sns.kdeplot(df['Acidity'], fill=True)

Pro tip: Use hue= to compare KDEs across groups.

4. Violin Plot by Processing Method

Combine distribution and summary stats in one plot.

sns.violinplot(data=df, x='Processing.Method', y='Body')

Pro tip: Violin plots are great alternatives to boxplots when sample sizes are large.

5. Correlation Heatmap

Visualize relationships between numerical variables.

sns.heatmap(df.corr(numeric_only=True), annot=True, fmt=".2f", cmap="coolwarm")

Pro tip: Focus on a subset of relevant columns to reduce clutter.

6. Pairplot of Key Metrics

Quickly explore pairwise relationships and distributions.

sns.pairplot(df[['Acidity', 'Aroma', 'Body', 'Flavor']])

Pro tip: Add hue=’Country.of.Origin’ to color by category.

7. Countplot of Variety

Count how many samples belong to each variety.

sns.countplot(data=df, y='Variety', order=df['Variety'].value_counts().index)

Pro tip: Use y= instead of x= for long category names.

8. Scatterplot of Acidity vs Body

Reveal potential relationships or clusters.

sns.scatterplot(data=df, x='Acidity', y='Body')

Pro tip: Add hue=’Processing.Method’ for multi-class separation.

9. Swarmplot of Aroma by Altitude

Use jittered points to show all values while preserving structure.

sns.swarmplot(data=df[df['Altitude.mean.meters'] < 2000], x='Aroma', y='Country.of.Origin')

Pro tip: Swarmplots are great for showing raw data without overlap.

10. Time Series Line Plot (Synthetic Date)

Simulate time-based trends even when timestamps are missing.

df['fake_date'] = pd.date_range(start='2020-01-01', periods=len(df), freq='D')
df.set_index('fake_date')['Total.Cup.Points'].rolling(30).mean().plot()

Pro tip: Rolling means help visualize moving trends and seasonal patterns.

Conclusions

Statistical plotting is more than showing off nice charts; it’s about extracting meaning from your data.

As we have demonstrated, with just one line of code, you can expose patterns, identify trends, and find outliers, all of which would have only been discoverable by “eyeballing” a table of raw data.

These 10 one-liners will help you explore, communicate, and validate your data-driven stories faster than ever. You can explore the full notebook on my GitHub repo.

Leave a Reply