Skip to content

Fixing ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'

Problem Overview

When working with pandas DataFrames stored as artifacts in Metaflow (or any pickled pandas object), you may encounter this error after upgrading to pandas 2.0.0 or newer:

python
ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'

The key characteristics of this issue:

  1. Occurs when accessing DataFrame properties like df.index (not during initial unpickling)
  2. Persists even after pandas upgrades (pip install pandas -U)
  3. Primarily affects files pickled with pandas 1.x and loaded in 2.x environments
  4. Caused by structural changes in pandas' internal architecture in v2.0

Preferred Method: Use pandas.read_pickle()

Load files using pandas' built-in deserialization method for version compatibility:

python
import pandas as pd

# Load pickled DataFrame (works across pandas versions)
file_path = "artifacts/file.pkl"  # Replace with your actual path
df = pd.read_pickle(file_path)

Why this works:

  • Handles pandas internal API changes transparently
  • Backward compatible to pandas 0.20.3
  • Resolves missing module dependencies
  • Works with Metaflow artifacts by reading from their storage path

Fallback Solution: Compatibility Shims

For situations where pd.read_pickle() fails:

python
import pandas as pd

# Use pandas' compatibility layer for older pickles
df = pd.compat.pickle_compat.load('file.pkl')

Version Locking Approach (If Solutions Fail)

If you need to maintain legacy systems:

bash
pip install "pandas<2.0.0"   # Downgrade to latest 1.x version

Important Considerations

  1. Metaflow-Specific Workflow:
    • In Metaflow flows, construct the artifact path correctly:
      python
      from metaflow import Flow, get_metadata
      flow = Flow('YourFlowName')
      run = flow.latest_successful_run
      file_path = run.data.dataframe.path  # Replace 'dataframe' with your artifact name

WARNING

Using the standard pickle module directly (pickle.load() or joblib.load()) instead of pd.read_pickle() may trigger this error due to missing internal pandas module paths in v2.x.

Prevention Strategies

  1. Pin pandas versions in dependencies:

    python
    # requirements.txt
    pandas>=1.5.3,<2.0.0
  2. Migrate serialization formats:

    python
    # Use modern formats instead of pickle
    df.to_parquet("data.parquet")      # Better version compatibility
    df.to_feather("data.feather")      # Faster I/O

Best Practice

Always use the same major pandas version for both pickling and unpickling operations. Major releases (1.x → 2.x) often break compatibility.