
Image by Editor | ChatGPT
Statistical plotting is an important part of exploratory data analysis (EDA). Quickly and easily visualizing distributions, relationships, and anomalies can help you recognize patterns and insights in no time.
This tutorial will take you through 10 one-liners in Python for statistical plotting, using the Arabica coffee quality dataset. This dataset includes over 1,300 rows of sensory and chemical measurements from coffee samples around the world.
Let’s dive in!
Getting Started
We’ll be using pandas, matplotlib, and seaborn, which together offer a rich and intuitive plotting interface. To follow along, first load the dataset into your environment:
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt url = 'https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/arabica_data_cleaned.csv' df = pd.read_csv(url)
1. Histogram of Total Cup Points
Visualize the distribution of scores with a simple histogram.
df['Total.Cup.Points'].hist(bins=20)
Pro tip: Use seaborn.histplot() for kernel density overlays.
2. Boxplot by Country
Compare score distributions across countries.
sns.boxplot(data=df, x='Country.of.Origin', y='Total.Cup.Points')
Pro tip: Add order= to control country order or focus on top N.
3. KDE Plot of Acidity
Get a smooth estimate of acidity distribution.
sns.kdeplot(df['Acidity'], fill=True)
Pro tip: Use hue= to compare KDEs across groups.
4. Violin Plot by Processing Method
Combine distribution and summary stats in one plot.
sns.violinplot(data=df, x='Processing.Method', y='Body')
Pro tip: Violin plots are great alternatives to boxplots when sample sizes are large.
5. Correlation Heatmap
Visualize relationships between numerical variables.
sns.heatmap(df.corr(numeric_only=True), annot=True, fmt=".2f", cmap="coolwarm")
Pro tip: Focus on a subset of relevant columns to reduce clutter.
6. Pairplot of Key Metrics
Quickly explore pairwise relationships and distributions.
sns.pairplot(df[['Acidity', 'Aroma', 'Body', 'Flavor']])
Pro tip: Add hue=’Country.of.Origin’ to color by category.
7. Countplot of Variety
Count how many samples belong to each variety.
sns.countplot(data=df, y='Variety', order=df['Variety'].value_counts().index)
Pro tip: Use y= instead of x= for long category names.
8. Scatterplot of Acidity vs Body
Reveal potential relationships or clusters.
sns.scatterplot(data=df, x='Acidity', y='Body')
Pro tip: Add hue=’Processing.Method’ for multi-class separation.
9. Swarmplot of Aroma by Altitude
Use jittered points to show all values while preserving structure.
sns.swarmplot(data=df[df['Altitude.mean.meters'] < 2000], x='Aroma', y='Country.of.Origin')
Pro tip: Swarmplots are great for showing raw data without overlap.
10. Time Series Line Plot (Synthetic Date)
Simulate time-based trends even when timestamps are missing.
df['fake_date'] = pd.date_range(start='2020-01-01', periods=len(df), freq='D')
df.set_index('fake_date')['Total.Cup.Points'].rolling(30).mean().plot()
Pro tip: Rolling means help visualize moving trends and seasonal patterns.
Conclusions
Statistical plotting is more than showing off nice charts; it’s about extracting meaning from your data.
As we have demonstrated, with just one line of code, you can expose patterns, identify trends, and find outliers, all of which would have only been discoverable by “eyeballing” a table of raw data.
These 10 one-liners will help you explore, communicate, and validate your data-driven stories faster than ever. You can explore the full notebook on my GitHub repo.
