Get unique values from a column in Pandas DataFrame
In Pandas, retrieving unique values from DataFrame is used for analyzing categorical data or identifying duplicates. Let's learn how to get unique values from a column in Pandas DataFrame.
Get the Unique Values of Pandas using unique()
The.unique()method returns a NumPy array. It is useful for identifying distinct values in a column, which can be helpful when working with categorical data or detecting outliers. The order of the unique values is preserved based on their first occurrence.
Syntax: DataFrame['column_name'].unique()
Consider the following example: we are retrieving and printing the unique values from the 'B' column using the unique() method.
# Import pandas package
import pandas as pd
# create a dictionary with five fields each
data = {
'A': ['A1', 'A2', 'A3', 'A4', 'A5'],
'B': ['B1', 'B2', 'B3', 'B4', 'B4'],
'C': ['C1', 'C2', 'C3', 'C3', 'C3'],
'D': ['D1', 'D2', 'D2', 'D2', 'D2'],
'E': ['E1', 'E1', 'E1', 'E1', 'E1']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
print("Pandas DataFrame:")
display(df)
# Get the unique values of 'B' column
unique_values = df['B'].unique()
# Print the unique values
print("\nUnique values in 'B' column:")
print(unique_values)
Output:
The unique values returned are ['B1', 'B2', 'B3', 'B4'].
Find the unique values in a column using nunique()
Let's use .nunique() method to get the count of unique values in each column of the above dataframe.
# Get number of unique values in column 'A'
unique_values_A = df['A'].nunique()
# Print the number of unique values
print("Number of unique values in 'A' column:", unique_values_A)
# Get number of unique values in column 'B'
unique_values_B = df['B'].nunique()
# Print the number of unique values
print("\nNumber of unique values in 'B' column:", unique_values_B)
# Get number of unique values in column 'C'
unique_values_C = df['C'].nunique()
# Print the number of unique values
print("\nNumber of unique values in 'C' column:", unique_values_C)
# Get number of unique values in column 'D'
unique_values_D = df['D'].nunique()
# Print the number of unique values
print("\nNumber of unique values in 'D' column:", unique_values_D)
Output:
Number of unique values in 'A' column: 5
Number of unique values in 'B' column: 4
Number of unique values in 'C' column: 3
Number of unique values in 'D' column: 2
In addition to the .unique() method, there are other ways to retrieve unique values from a Pandas DataFrame, including:
Table of Content
Get Unique values from a Column in Pandas DataFrame using .drop_duplicates()
The .drop_duplicates() method removes duplicate values in the specified column, returning a DataFrame with only the unique values.
Syntax: DataFrame['column_name'].drop_duplicates()
Example: Get unique values from column 'C'
unique_values = df['C'].drop_duplicates()
print(unique_values)
Output:
0 C1
1 C2
2 C3
Name: C, dtype: object
This method returns the unique values as a Series and preserves the index of the original DataFrame.
Extracting Unique values in Pandas DataFrame Using .value_counts()
The .value_counts() method counts the occurrences of each unique value in the column and returns the result as a Series.
Syntax: DataFrame['column_name'].value_counts()
Example: Get unique values from column 'D' along with their counts
unique_values_count = df['D'].value_counts()
print(unique_values_count)
Output:
D
D2 4
D1 1
Name: count, dtype: int64
This method provides both the unique values and the frequency of each value. To extract just the unique values, you can use .index on the result.
unique_values = df['D'].value_counts().index
print(unique_values)
Output:
Index(['D2', 'D1'], dtype='object', name='D')
Get Unique values from a column in Pandas DataFrame using set()
You can also use Python’s built-in set() function, which converts the column values into a set, automatically removing duplicates.
Syntax: set(DataFrame['column_name'])
Example: Get unique values from column 'D'
unique_values = set(df['D'])
print(unique_values)
Output:
{'D1', 'D2'}
Using set() does not preserve the order of the unique values, but it is a quick way to get distinct values.
In short:
- The .unique() method returns a NumPy array of unique values, preserving their order of appearance.
- The .drop_duplicates() method returns a Series with unique values, preserving the original index.
- The .value_counts() method provides both the unique values and their frequency count.
- The set() function quickly returns unique values but does not preserve their order.