One of the most common Python libraries used for data analysis is pandas.
Within pandas, you can use the dtype function to check the “data type” of a particular object or column in a pandas DataFrame.
There are five main dtypes in pandas:
- object: Text or mixed numeric values
- bool: True or False values
- int64: Integer values
- float64: Floating point values
- datetime64: Date and time values
It’s useful to know the dtypes of objects in pandas because it can affect how calculations are performed and it can help you understand why you may be encountering errors when performing certain operations.
In practice, you can check the data dtype of a single column in a pandas DataFrame or a single pandas Series by using the following syntax:
df['some_column'].dtype
This will return the dtype for the column that we specify.
Or, you can use the dtypes function to return the data type of every single column in a pandas DataFrame:
df.dtypes
This will return the dtype of each column, which is particularly useful so that we don’t have to write a for-loop or type out dtype multiple times to find the data type of each column.
The following example shows how to check the dtype of columns in a pandas DataFrame in practice.
Example: How to Use dtype and dtypes in Pandas
Suppose we create the following pandas DataFrame that contains information about various basketball players:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B'], 'points': [18, 22, 19, 14, 14, 11], 'assists': [5, 7, 7, 9, 12, 9], 'minutes': [2.1, 4, 5.8, 9, 9.2, 3.5], 'all_star': [True, False, False, True, True, True]}) #view DataFrame print(df) team points assists minutes all_star 0 A 18 5 2.1 True 1 A 22 7 4.0 False 2 A 19 7 5.8 False 3 B 14 9 9.0 True 4 B 14 12 9.2 True 5 B 11 9 3.5 True
We can see that the DataFrame has five total columns.
Suppose that we would like to display the data type of just the assists column.
We can use the following syntax to do so:
#display data type of 'assists' column
df['some_column'].dtype
dtype('int64')
This returns dtype(‘int64’) which tells us that the assists column is an integer column.
If we’d like, we can also use the following syntax to display the data type of just the assists and minutes columns:
#display data type of 'assists' and 'minutes' columns
df[['assists', 'minutes']].dtypes
assists int64
minutes float64
dtype: object
Note that when specifying multiple columns, we must use double brackets or else we will receive an error.
From the output we can see that the assists column is an integer data type while the minutes column is a floating point data type.
This should make sense considering the minutes column has decimal values to represent the fraction of minutes that particular athletes can play in a game.
Lastly, we can use the following syntax to display the data type of each column in the pandas DataFrame:
#display data type of each column in DataFrame
df.dtypes
team object
points int64
assists int64
minutes float64
all_star bool
dtype: object
The output shows the data type of each column in the DataFrame.
Note: In practice, using df.dtypes is one of the most common commands you will use when analyzing real-world data since it allows you to gain an understanding of the underlying data types that you’re working with in a particular DataFrame.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: How to Specify dtypes when Importing Excel File
Pandas: How to Specify dtypes when Importing CSV File
Pandas: How to Check dtype for All Columns in DataFrame