Why does only part of my data print when I filter using dplyr? [duplicate]
Why does only part of my data print when I filter using dplyr? [duplicate]
This question already has an answer here:
My code is below:
df1 <- data.frame(attri = c(1, 1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,
2,2,2, 3,3,3,3,3,3,3,3,3,3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 15),
type = c(10, 14, 19, 25, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42,43, 44, 49,
55, 52, 53, 54, 55, 56, 57, 58, 59,
60, 61, 62, 60, 64, 69, 75, 72, 73, 74,
75, 76, 77, 78, 79))
i <- 1:3
df2 <- dplyr::filter(df1, attri==i)
View(df2)
When I run this, it outputs the following data frame:
attri type
1 1 10
2 1 25
3 1 34
4 1 37
5 2 38
6 2 41
7 2 44
8 2 52
9 3 53
10 3 56
11 3 59
12 3 62
How do I view the entire data set that satisfies the filter?
Thanks in advance
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
df2 <- dplyr::filter(df1, attri %in% i)
filter(df1[-1,], attri == i)
1 Answer
1
You want attri %in% i
not attri == i
.
attri %in% i
attri == i
R "recycles" vectors if one is shorter than the other. For example, suppose you want to add two vectors, say x=c(1,2,3) and y=c(5), and so you write the code x+y. R recycles the shorter vector (y) until is the same length as the longer vector (x) and then it adds these vectors element-wise. So x+y is really (1, 2, 3) + (5, 5, 5) = (6, 7, 8).
Your test (attri == i
) is comparing the attri
vector to the vector (1, 2, 3). Because the second vector is shorter than the first, it is recycled to become (1, 2, 3, 1, 2, 3, 1, 2, ...) until it is the same length as attri
. The then test ==
checks for equality element-wise.
attri == i
attri
attri
==
By using attri %in% i
you are asking R if each element of attri
is either 1, 2, or 3. This will return a vector of TRUE/FALSE that is the same length as attri
. filter()
then selects the rows of df1
where the test is TRUE. This is what you want.
attri %in% i
attri
attri
filter()
df1
try
df2 <- dplyr::filter(df1, attri %in% i)
I think you're running into an argument recycling issue. Examine the warning message when runningfilter(df1[-1,], attri == i)
– bouncyball
yesterday