Skip to content

Pandas groupby.apply Deprecation Warning

Problem Statement

When using groupby.apply() in pandas 2.2.0+, you may encounter this warning:

python
DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. 
This behavior is deprecated, and in a future version of pandas the grouping columns 
will be excluded from the operation.

This occurs because pandas historically included the group-by columns in the group DataFrame passed to apply(). The new behavior (coming in pandas 3.0) will exclude these columns by default. The warning helps you update your code before the breaking change.

In your specific operation:

python
fprice = df.groupby(['StartDate', 'Commodity', 'DealType']).apply(
    lambda group: -(group['MTMValue'].sum() - 
                  (group['FixedPriceStrike'] * group['Quantity']).sum()) / 
    group['Quantity'].sum()
).reset_index(name='FloatPrice')

The grouping columns (StartDate, Commodity, DealType) are included in group but aren't used in your calculation.


Solution

Add include_groups=False to your apply() call:

python
fprice = df.groupby(['StartDate', 'Commodity', 'DealType']).apply(
    lambda group: -(group['MTMValue'].sum() - 
                  (group['FixedPriceStrike'] * group['Quantity']).sum()) / 
    group['Quantity'].sum(),
    include_groups=False  # ← Silences the warning
).reset_index(name='FloatPrice')

Why this works:

  • Your lambda function only uses MTMValue, FixedPriceStrike, and Quantity
  • include_groups=False excludes the group-by columns from group, matching pandas' future behavior
  • This fixes the warning while maintaining identical results

Key Insight

You only need the grouping columns in the final aggregation result—not during the calculation. Pandas automatically handles their inclusion in the index when you call reset_index().


Explanation

Behavior Change in Pandas 2.2+

include_groups=Current DefaultFuture (3.0+)Behavior
True (default)Group-by columns included in group
FalseDefaultGroup-by columns excluded from group

Why this matters

  1. Avoid bugs: Including group-by columns can distort calculations (e.g., if they're numeric and you call mean())
  2. Efficiency: Excluding unused columns saves memory
  3. Consistency: Matches what developers intuitively expect

Incorrect Usage Example

This calculates incorrect means because it includes the numeric group-by column a:

python
# Bad: Includes group-by column 'a' in operations
df.groupby('a').apply(np.mean)

Output with a=[1,1,2,2]:

a   b  
1  1.5  # Incorrect! (1+1+1+2)/4 = 1.25
2  3.5  # Incorrect! (2+2+4+5)/4 = 3.25

Solution:

python
df.groupby('a').apply(np.mean, include_groups=False)

Gives correct:

   b
a   
1  1.5  # (1+2)/2 = 1.5
2  4.5  # (4+5)/2 = 4.5

Alternative Solutions

If you do need access to the group-by columns during apply(), use:

1. Explicitly include columns in the group operation

python
# Manually list ALL columns to use (including group-by columns)
group_cols = ['StartDate', 'Commodity', 'DealType']
calc_cols = group_cols + ['MTMValue', 'FixedPriceStrike', 'Quantity']

fprice = df.groupby(group_cols)[calc_cols].apply(
    lambda group: ...  # Your logic
).reset_index(name='FloatPrice')

2. Use group names via group.name

python
fprice = df.groupby(['DealType']).apply(
    lambda group: (
        group.value.sum()
        + group.name  # ← Access group key (e.g., 'DealType=A')
    ),
    include_groups=False
)

Final Recommendation

For most users (especially if you don't use group-by columns in calculations):

  1. Add include_groups=False to apply() calls
  2. Test results with small datasets to confirm identical output

Your corrected code:

python
fprice = df.groupby(['StartDate', 'Commodity', 'DealType']).apply(
    lambda group: -(group['MTMValue'].sum() - 
                  (group['FixedPriceStrike'] * group['Quantity']).sum()) / 
    group['Quantity'].sum(),
    include_groups=False  # Fixes warning + future-proofs code
).reset_index(name='FloatPrice')