dplyr summarise() regrouping output message

When using dplyr's summarise() function with grouped data, you may encounter a message like:

`summarise()` regrouping output by 'year' (override with `.groups` argument)

This message indicates how dplyr is handling the grouping structure of your output data frame, not an error in your analysis.

Understanding the message

The message appears when you use summarise() on data that has been grouped with multiple variables. By default, dplyr "peels off" the last grouping variable after summarization.

Example with single grouping variable

library(dplyr)

mtcars %>%
  group_by(am) %>% 
  summarise(mpg = sum(mpg))

Output:

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 × 2
     am   mpg
  <dbl> <dbl>
1     0  326.
2     1  317.

With a single grouping variable, dplyr completely removes the grouping after summarization.

Example with multiple grouping variables

mtcars %>% 
  group_by(am, vs) %>% 
  summarise(mpg = sum(mpg))

Output:

`summarise()` regrouping output by 'am' (override with `.groups` argument)
# A tibble: 4 × 3
# Groups:   am [2]
    am    vs   mpg
  <dbl> <dbl> <dbl>
1     0     0  181.
2     0     1  145.
3     1     0  118.
4     1     1  199.

Here, dplyr preserved the first grouping variable (am) but dropped the second (vs).

The `.groups` argument

You can control this behavior using the .groups argument in summarise():

# Remove all grouping
mtcars %>% 
  group_by(am, vs) %>%
  summarise(mpg = sum(mpg), .groups = 'drop')

# Keep original grouping structure
mtcars %>% 
  group_by(am, vs) %>%
  summarise(mpg = sum(mpg), .groups = 'keep')

# Default behavior (drop last group)
mtcars %>% 
  group_by(am, vs) %>%
  summarise(mpg = sum(mpg), .groups = 'drop_last')

INFO

The .groups argument accepts four options:

"drop_last": Drops the last level of grouping (default)
"drop": Removes all grouping
"keep": Preserves the original grouping structure
"rowwise": Treats each row as its own group

Why this matters

The grouping structure affects subsequent operations. For example:

# With default regrouping
result1 <- mtcars %>% 
  group_by(cyl, am) %>% 
  summarise(avg_mpg = mean(mpg))

# Subsequent summarization uses the remaining grouping
result1 %>% summarise(min_avg_mpg = min(avg_mpg))

Versus:

# With all grouping dropped
result2 <- mtcars %>% 
  group_by(cyl, am) %>% 
  summarise(avg_mpg = mean(mpg), .groups = 'drop')

# Subsequent summarization operates on the entire dataset
result2 %>% summarise(min_avg_mpg = min(avg_mpg))

Practical recommendations

Explicitly specify .groups for reproducible code
Use .groups = "drop" when you're done with grouped operations
The message is informational - your results are correct regardless

TIP

To suppress these messages globally, set:

options(dplyr.summarise.inform = FALSE)

Historical context

This messaging behavior was introduced in dplyr 1.0.0 to make grouping behavior more transparent. Prior versions would silently drop grouping levels, which sometimes led to unexpected results in multi-step data processing pipelines.

The message serves as a reminder to be aware of the current grouping structure of your data, especially when chaining multiple dplyr operations together.

Related Posts

dplyr summarise() regrouping output message ​

Understanding the message ​

Example with single grouping variable ​

Example with multiple grouping variables ​

The .groups argument ​

Why this matters ​

Practical recommendations ​

Historical context ​