Pandas Read CSV in Python

CSV files are Comma-Separated values files that allow storage of tabular data.

To access data from the CSV file, we require a function read_csv() from Pandas that retrieves data in the form of the data frame.
First, we must import the Pandas library, then using Pandas load this data into a DataFrame

In the code below, we are working with a CSV file named people.csv which contains people data.

PYTHON

import pandas as pd

df = pd.read_csv("people.csv")
df

Output

Pandas-Read-CSV — Pandas Read CSV in Python

read_csv() function

read_csv() function in Pandas is used to read data from CSV files into a Pandas DataFrame. A DataFrame is a data structure that allows you to manipulate and analyze tabular data efficiently. CSV files are plain-text files where each row represents a record and columns are separated by commas (or other delimiters).

Syntax

pd.read_csv(filepath_or_buffer, sep=' ,' , header='infer', index_col=None, usecols=None, engine=None, skiprows=None, nrows=None)

Parametersthere are no :

filepath_or_buffer: Location of the csv file. It accepts any string path or URL of the file.
sep: It stands for separator, default is ', '.
header: It accepts int, a list of int, row numbers to use as the column names and the start of the data. If no names are passed, i.e., header=None, then, it will display the first column as 0, the second as 1 and so on.
usecols: Retrieves only selected columns from the CSV file.
nrows: Number of rows to be displayed from the dataset.
index_col: If set to None, Pandas automatically assigns a default integer index (0, 1, 2, ...) to the dataset.
skiprows: Skips passed rows in the new data frame.

Features in Pandas read_csv

1. Read specific columns using read_csv

The usecols parameter allows to load only specific columns from a CSV file. This reduces memory usage and processing time by importing only the required data.

Python

df = pd.read_csv("people.csv", usecols=["First Name", "Email"])
print(df)

Output

2. Setting an Index Column (index_col)

The index_col parameter sets one or more columns as the DataFrame index, making the specified column(s) act as row labels for easier data referencing.

Python

df = pd.read_csv("people.csv", index_col="First Name")
print(df)

Output

setting-columns-as-the-DataFrame-index — Read CSV in Python

3. Handling Missing Values Using read_csv

The na_values parameter replaces specified strings (e.g., "N/A", "Unknown") with NaN, enabling consistent handling of missing or incomplete data during analysis.\

Python

df = pd.read_csv("people.csv", na_values=["N/A", "Unknown"])

na_values only specifies which values should be treated as NaN; it does not guarantee that the dataset has no missing values.

4. Reading CSV Files with Different Delimiters

In this example, we will take a CSV file and then add some special characters to see how the sepparameter works.

Python

import pandas as pd

data = """totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4"""

with open("sample.csv", "w") as file:
    file.write(data)
print(data)

Output

totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Fe...

The sample data is stored in a multi-line string for demonstration purposes.

Separator (sep): The sep='[:, |_]' argument allows Pandas to handle multiple delimiters (:, |, _, ,) using a regular expression.
Engine: The engine='python' argument is used because the default C engine does not support regular expressions for delimiters.

Python

df = pd.read_csv('sample.csv',
                 sep='[:, |_]',  
                 engine='python')  
df

Output

2222 — CSV Files with Different Delimiters

5. Using nrows in read_csv()

The nrows parameter limits the number of rows read from a file, enabling quick previews or partial data loading for large datasets. Here, we just display only 3 rows using nrows parameter.

Python

df = pd.read_csv('people.csv', nrows=3)
df

Output

6. Using skiprows in read_csv()

The skiprows parameter skips unnecessary rows at the start of a file, which is useful for ignoring metadata or extra headers that are not part of the dataset.

Python

df= pd.read_csv("people.csv")
print("Previous Dataset: ")
print(df)

df = pd.read_csv("people.csv", skiprows = [4,5])
print("Dataset After skipping rows: ")
print(df)

Output

7. Parsing Dates (parse_dates)

The parse_dates parameter converts date columns into datetime objects, simplifying operations like filtering, sorting or time-based analysis.

Python

df = pd.read_csv("people.csv", parse_dates=["Date of birth"])
print(df.info())

Output

Loading a CSV Data from a URL

Pandas allows you to directly read a CSV file hosted on the internet using the file's URL. This can be incredibly useful when working with datasets shared on websites, cloud storage or public repositories like GitHub.

Python

url = "https://media.geeksforgeeks.org/wp-content/uploads/20241121154629307916/people_data.csv"
df = pd.read_csv(url)
df

Output

Pandas Read CSV in Python

read_csv() function

Syntax

Features in Pandas read_csv

1. Read specific columns using read_csv

2. Setting an Index Column (index_col)

3. Handling Missing Values Using read_csv

4. Reading CSV Files with Different Delimiters

5. Using nrows in read_csv()

6. Using skiprows in read_csv()

7. Parsing Dates (parse_dates)

Loading a CSV Data from a URL

Explore