Fixing ImportError: cannot import name 'joblib' from 'sklearn.externals'
Problem Overview
When trying to import joblib
from sklearn.externals
, you encounter the error:
from sklearn.externals import joblib
ImportError: cannot import name 'joblib' from 'sklearn.externals'
This issue typically occurs with older code that was written for scikit-learn versions prior to 0.23, where joblib
was available through sklearn.externals
. In newer versions of scikit-learn (0.23+), this internal reference has been completely removed.
Compatibility Issue
The error indicates your code or saved models were created with an older version of scikit-learn and need to be updated to work with current versions.
Solution: Direct joblib Import
The simplest and recommended solution is to replace the deprecated import:
# Old way (deprecated)
from sklearn.externals import joblib
# New way (correct)
import joblib
Quick Fix
Install joblib directly if not already installed:
pip install joblib
Then use:
import joblib
# Your existing joblib.load() and joblib.dump() calls will work
Handling Legacy Pickle Files
If you encounter errors when loading previously saved models, the issue may be that your pickle files reference the old import path. Here's how to resolve this:
Method 1: Update the Pickle File
- Temporarily install an older scikit-learn version (0.21.x or 0.22.x):
pip install scikit-learn==0.22.2
- Create a migration script:
import sklearn.external.joblib as extjoblib
import joblib
# Load with old method
model = extjoblib.load('old_model.pkl')
# Save with new method
joblib.dump(model, 'new_model.pkl')
- Return to current scikit-learn version and use the updated file.
Method 2: Manual Import Workaround
For quick testing, you can create a compatibility layer:
try:
from sklearn.externals import joblib
except ImportError:
import joblib
Complete Working Example
Here's how to properly structure your code with current best practices:
import pandas as pd
import numpy as np
import joblib # Correct import
import boto3 # Better than subprocess for AWS operations
from botocore.exceptions import ClientError
def load_d2v(model_name, env='dev'):
if env == 'dev':
try:
return joblib.load(model_name)
except FileNotFoundError:
return download_from_s3(model_name)
else:
return download_from_s3(model_name)
def download_from_s3(model_name):
s3_path = f's3://sd-flikku/datalake/doc2vec_model/{model_name}'
# Using boto3 is more robust than subprocess
s3 = boto3.resource('s3')
bucket_name = 'sd-flikku'
key = f'datalake/doc2vec_model/{model_name}'
try:
s3.Bucket(bucket_name).download_file(key, model_name)
print(f'Downloaded {model_name} from S3')
return joblib.load(model_name)
except ClientError as e:
print(f"Error downloading file: {e}")
raise
AWS Security Note
When working with S3, use IAM roles and permissions instead of hardcoded credentials for better security practices.
Version Compatibility Table
scikit-learn Version | joblib Location | Status |
---|---|---|
< 0.21 | sklearn.externals.joblib | Deprecated |
0.21-0.22 | Both available (with warnings) | Transitional |
≥ 0.23 | Only import joblib | Current standard |
Additional Recommendations
- Update your development environment:
pip install --upgrade scikit-learn joblib
- Check for other deprecated imports in your codebase, particularly:
# These may also cause issues in newer versions
from sklearn.externals import six
from sklearn.utils.validation import has_fit_parameter
- Test your code with different scikit-learn versions using virtual environments to ensure compatibility.
Conclusion
The ImportError occurs because scikit-learn removed the internal joblib
reference in version 0.23. The solution is to:
- Use
import joblib
directly instead of the deprecatedfrom sklearn.externals import joblib
- Update any legacy pickle files that reference the old import path
- Ensure joblib is installed as a separate dependency
By following these steps, you'll maintain compatibility with current scikit-learn versions while preserving access to your existing models.