You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I am not sure if the behavior I encountered was expected or not, or if it is documented anywhere. But I was certainly not expecting it! And it seems a little counter-intuitive to me if this is supposed to be the correct behavior.
Basically when conducting a grouped-by filter and working with missing values, dtplyr will return a row of all missing values for each missing value.
I have noticed the issue arises when using group_by() specifically. In other words, if I remove the group_by(), the code returns a table which matches the output from dplyr.
However, it seems like it would be better to fix dtplyr to match the output from dplyr? Or maybe highlight in the documentation somewhere why dtplyr and dplyr differ here. My apologies, again, if this has been discussed somewhere already. If you could link to that discussion, that'd be very useful!
The text was updated successfully, but these errors were encountered:
Looks like this is happening because we use the .I trick when filtering by group. .I returns a vector with NAs, and slicing using NAs causes you to make empty rows in data.table.
library(dtplyr)
library(dplyr)
example_df<- data.table(id= c(1, 1, 1, 1), value= c(NA, NA, 0, 1))
example_df %>%
lazy_dt() %>%
filter(value==0, .by=id)
#> Source: local data table [3 x 2]#> Call: `_DT1`[`_DT1`[, .I[value == 0], by = .(id)]$V1]#> #> id value#> <dbl> <dbl>#> 1 NA NA#> 2 NA NA#> 3 1 0#> #> # Use as.data.table()/as.data.frame()/as_tibble() to access resultsexample_df[c(NA, NA, 3)]
#> id value#> <num> <num>#> 1: NA NA#> 2: NA NA#> 3: 1 0
Hi! I am not sure if the behavior I encountered was expected or not, or if it is documented anywhere. But I was certainly not expecting it! And it seems a little counter-intuitive to me if this is supposed to be the correct behavior.
Basically when conducting a grouped-by filter and working with missing values, dtplyr will return a row of all missing values for each missing value.
I have noticed the issue arises when using group_by() specifically. In other words, if I remove the group_by(), the code returns a table which matches the output from dplyr.
I wound up including extra clauses in my filter statement to get the correct behavior:
However, it seems like it would be better to fix dtplyr to match the output from dplyr? Or maybe highlight in the documentation somewhere why dtplyr and dplyr differ here. My apologies, again, if this has been discussed somewhere already. If you could link to that discussion, that'd be very useful!
The text was updated successfully, but these errors were encountered: