dplyr summarise() regrouping output message
When using dplyr's summarise()
function with grouped data, you may encounter a message like:
`summarise()` regrouping output by 'year' (override with `.groups` argument)
This message indicates how dplyr is handling the grouping structure of your output data frame, not an error in your analysis.
Understanding the message
The message appears when you use summarise()
on data that has been grouped with multiple variables. By default, dplyr "peels off" the last grouping variable after summarization.
Example with single grouping variable
library(dplyr)
mtcars %>%
group_by(am) %>%
summarise(mpg = sum(mpg))
Output:
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 × 2
am mpg
<dbl> <dbl>
1 0 326.
2 1 317.
With a single grouping variable, dplyr completely removes the grouping after summarization.
Example with multiple grouping variables
mtcars %>%
group_by(am, vs) %>%
summarise(mpg = sum(mpg))
Output:
`summarise()` regrouping output by 'am' (override with `.groups` argument)
# A tibble: 4 × 3
# Groups: am [2]
am vs mpg
<dbl> <dbl> <dbl>
1 0 0 181.
2 0 1 145.
3 1 0 118.
4 1 1 199.
Here, dplyr preserved the first grouping variable (am
) but dropped the second (vs
).
The .groups
argument
You can control this behavior using the .groups
argument in summarise()
:
# Remove all grouping
mtcars %>%
group_by(am, vs) %>%
summarise(mpg = sum(mpg), .groups = 'drop')
# Keep original grouping structure
mtcars %>%
group_by(am, vs) %>%
summarise(mpg = sum(mpg), .groups = 'keep')
# Default behavior (drop last group)
mtcars %>%
group_by(am, vs) %>%
summarise(mpg = sum(mpg), .groups = 'drop_last')
INFO
The .groups
argument accepts four options:
"drop_last"
: Drops the last level of grouping (default)"drop"
: Removes all grouping"keep"
: Preserves the original grouping structure"rowwise"
: Treats each row as its own group
Why this matters
The grouping structure affects subsequent operations. For example:
# With default regrouping
result1 <- mtcars %>%
group_by(cyl, am) %>%
summarise(avg_mpg = mean(mpg))
# Subsequent summarization uses the remaining grouping
result1 %>% summarise(min_avg_mpg = min(avg_mpg))
Versus:
# With all grouping dropped
result2 <- mtcars %>%
group_by(cyl, am) %>%
summarise(avg_mpg = mean(mpg), .groups = 'drop')
# Subsequent summarization operates on the entire dataset
result2 %>% summarise(min_avg_mpg = min(avg_mpg))
Practical recommendations
- Explicitly specify
.groups
for reproducible code - Use
.groups = "drop"
when you're done with grouped operations - The message is informational - your results are correct regardless
TIP
To suppress these messages globally, set:
options(dplyr.summarise.inform = FALSE)
Historical context
This messaging behavior was introduced in dplyr 1.0.0 to make grouping behavior more transparent. Prior versions would silently drop grouping levels, which sometimes led to unexpected results in multi-step data processing pipelines.
The message serves as a reminder to be aware of the current grouping structure of your data, especially when chaining multiple dplyr operations together.