Often you may want to check if two data frames contain the same rows (regardless of order) in R.
Fortunately this is easy to do by using the setequal() function from the dplyr package in R, which is designed to perform this exact task.
The setequal() function uses the following basic syntax:
setequal(x, y)
where:
- x: The name of the first data frame
- y: The name of the second data frame
Note that this function returns TRUE if both data frames contain all of the same rows or FALSE if the two data frames do not contain all of the same rows.
The following example shows how to use the setequal() function from the dplyr package in practice.
Note: Before using the setequal() function, you may need to first install the dplyr package by using the following syntax:
install.packages('dplyr')
Once the dplyr package is installed, you can use the setequal() function.
Example: How to Use the setequal() Function in dplyr
Suppose we create the following two data frames named df1 and df2:
#create first data frame df1 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B'), points=c(14, 14, 19, 25, 40, 34)) df1 team points 1 A 14 2 A 14 3 A 19 4 A 25 5 B 40 6 B 34 #create second data frame df2 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B'), points=c(14, 14, 25, 19, 40, 34)) df2 team points 1 A 14 2 A 14 3 A 25 4 A 19 5 B 40 6 B 34
Suppose that we would like to check if the two data frames contain the same rows, regardless of whether or not the rows are in the same order.
We can use the setequal() function from the dplyr package to do so:
library(dplyr) #check if both data frames contain the same rows setequal(df1, df2) [1] TRUE
This returns TRUE, which tells us that the two data frames contain the same rows.
Note that the order of row numbers 3 and 4 are switched between the two data frames, but since these rows contain the same values the setequal() function still returns TRUE.
Suppose instead that we changed the last value of the points column in the second data frame:
#create first data frame df1 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B'), points=c(14, 14, 19, 25, 40, 34)) df1 team points 1 A 14 2 A 14 3 A 19 4 A 25 5 B 40 6 B 34 #create second data frame df2 <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B'), points=c(14, 14, 25, 19, 40, 60)) df2 team points 1 A 14 2 A 14 3 A 25 4 A 19 5 B 40 6 B 60
Now suppose that we would like to check if these two data frames contain the same rows.
We can use the setequal() function from the dplyr package once again to do so:
library(dplyr) #check if both data frames contain the same rows setequal(df1, df2) [1] FALSE
This returns FALSE, which tells us that the two data frames do not contain the same rows.
This is the expected result since we intentionally changed the last row in the second data frame to be different.
Note: You can find the complete documentation for the setequal() function from the dplyr package here.
Additional Resources
The following tutorials explain how to perform other common tasks in R:
How to Use slice_min() in dplyr
How to Use the pull() Function in dplyr
How to Use top_n() in dplyr
How to Rename Columns Using dplyr