CSV files are Comma-Separated values files that allow storage of tabular data.
- To access data from the CSV file, we require a function read_csv() from Pandas that retrieves data in the form of the data frame.
- First, we must import the Pandas library, then using Pandas load this data into a DataFrame
In the code below, we are working with a CSV file named people.csv which contains people data.
import pandas as pd
df = pd.read_csv("people.csv")
df
Output
read_csv() function
read_csv() function in Pandas is used to read data from CSV files into a Pandas DataFrame. A DataFrame is a data structure that allows you to manipulate and analyze tabular data efficiently. CSV files are plain-text files where each row represents a record and columns are separated by commas (or other delimiters).
Syntax
pd.read_csv(filepath_or_buffer, sep=' ,' , header='infer', index_col=None, usecols=None, engine=None, skiprows=None, nrows=None)
Parametersthere are no :
- filepath_or_buffer: Location of the csv file. It accepts any string path or URL of the file.
- sep: It stands for separator, default is ', '.
- header: It accepts int, a list of int, row numbers to use as the column names and the start of the data. If no names are passed, i.e., header=None, then, it will display the first column as 0, the second as 1 and so on.
- usecols: Retrieves only selected columns from the CSV file.
- nrows: Number of rows to be displayed from the dataset.
- index_col: If set to None, Pandas automatically assigns a default integer index (0, 1, 2, ...) to the dataset.
- skiprows: Skips passed rows in the new data frame.
Features in Pandas read_csv
1. Read specific columns using read_csv
The usecols parameter allows to load only specific columns from a CSV file. This reduces memory usage and processing time by importing only the required data.
df = pd.read_csv("people.csv", usecols=["First Name", "Email"])
print(df)
Output

2. Setting an Index Column (index_col)
The index_col parameter sets one or more columns as the DataFrame index, making the specified column(s) act as row labels for easier data referencing.
df = pd.read_csv("people.csv", index_col="First Name")
print(df)
Output

3. Handling Missing Values Using read_csv
The na_values parameter replaces specified strings (e.g., "N/A", "Unknown") with NaN, enabling consistent handling of missing or incomplete data during analysis.\
df = pd.read_csv("people.csv", na_values=["N/A", "Unknown"])
na_values only specifies which values should be treated as NaN; it does not guarantee that the dataset has no missing values.
4. Reading CSV Files with Different Delimiters
In this example, we will take a CSV file and then add some special characters to see how the sepparameter works.
import pandas as pd
data = """totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4"""
with open("sample.csv", "w") as file:
file.write(data)
print(data)
Output
totalbill_tip, sex:smoker, day_time, size 16.99, 1.01:Female|No, Sun, Dinner, 2 10.34, 1.66, Male, No|Sun:Dinner, 3 21.01:3.5_Male, No:Sun, Dinner, 3 23.68, 3.31, Male|No, Sun_Dinner, 2 24.59:3.61, Fe...
The sample data is stored in a multi-line string for demonstration purposes.
- Separator (
sep): The sep='[:, |_]' argument allows Pandas to handle multiple delimiters (:, |, _, ,) using a regular expression. - Engine: The engine='python' argument is used because the default C engine does not support regular expressions for delimiters.
df = pd.read_csv('sample.csv',
sep='[:, |_]',
engine='python')
df
Output

5. Using nrows in read_csv()
The nrows parameter limits the number of rows read from a file, enabling quick previews or partial data loading for large datasets. Here, we just display only 3 rows using nrows parameter.
df = pd.read_csv('people.csv', nrows=3)
df
Output

6. Using skiprows in read_csv()
The skiprows parameter skips unnecessary rows at the start of a file, which is useful for ignoring metadata or extra headers that are not part of the dataset.
df= pd.read_csv("people.csv")
print("Previous Dataset: ")
print(df)
df = pd.read_csv("people.csv", skiprows = [4,5])
print("Dataset After skipping rows: ")
print(df)
Output

7. Parsing Dates (parse_dates)
The parse_dates parameter converts date columns into datetime objects, simplifying operations like filtering, sorting or time-based analysis.
df = pd.read_csv("people.csv", parse_dates=["Date of birth"])
print(df.info())
Output

Loading a CSV Data from a URL
Pandas allows you to directly read a CSV file hosted on the internet using the file's URL. This can be incredibly useful when working with datasets shared on websites, cloud storage or public repositories like GitHub.
url = "https://media.geeksforgeeks.org/wp-content/uploads/20241121154629307916/people_data.csv"
df = pd.read_csv(url)
df
Output
