Fixing ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'
Problem Overview
When working with pandas DataFrames stored as artifacts in Metaflow (or any pickled pandas object), you may encounter this error after upgrading to pandas 2.0.0 or newer:
ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'
The key characteristics of this issue:
- Occurs when accessing DataFrame properties like
df.index
(not during initial unpickling) - Persists even after pandas upgrades (
pip install pandas -U
) - Primarily affects files pickled with pandas 1.x and loaded in 2.x environments
- Caused by structural changes in pandas' internal architecture in v2.0
Recommended Solutions
Preferred Method: Use pandas.read_pickle()
Load files using pandas' built-in deserialization method for version compatibility:
import pandas as pd
# Load pickled DataFrame (works across pandas versions)
file_path = "artifacts/file.pkl" # Replace with your actual path
df = pd.read_pickle(file_path)
Why this works:
- Handles pandas internal API changes transparently
- Backward compatible to pandas 0.20.3
- Resolves missing module dependencies
- Works with Metaflow artifacts by reading from their storage path
Fallback Solution: Compatibility Shims
For situations where pd.read_pickle()
fails:
import pandas as pd
# Use pandas' compatibility layer for older pickles
df = pd.compat.pickle_compat.load('file.pkl')
Version Locking Approach (If Solutions Fail)
If you need to maintain legacy systems:
pip install "pandas<2.0.0" # Downgrade to latest 1.x version
Important Considerations
- Metaflow-Specific Workflow:
- In Metaflow flows, construct the artifact path correctly:python
from metaflow import Flow, get_metadata flow = Flow('YourFlowName') run = flow.latest_successful_run file_path = run.data.dataframe.path # Replace 'dataframe' with your artifact name
- In Metaflow flows, construct the artifact path correctly:
WARNING
Using the standard pickle
module directly (pickle.load()
or joblib.load()
) instead of pd.read_pickle()
may trigger this error due to missing internal pandas module paths in v2.x.
Prevention Strategies
Pin pandas versions in dependencies:
python# requirements.txt pandas>=1.5.3,<2.0.0
Migrate serialization formats:
python# Use modern formats instead of pickle df.to_parquet("data.parquet") # Better version compatibility df.to_feather("data.feather") # Faster I/O
Best Practice
Always use the same major pandas version for both pickling and unpickling operations. Major releases (1.x → 2.x) often break compatibility.