Skip to content

Fixing ImportError: cannot import name 'joblib' from 'sklearn.externals'

Problem Overview

When trying to import joblib from sklearn.externals, you encounter the error:

python
from sklearn.externals import joblib
ImportError: cannot import name 'joblib' from 'sklearn.externals'

This issue typically occurs with older code that was written for scikit-learn versions prior to 0.23, where joblib was available through sklearn.externals. In newer versions of scikit-learn (0.23+), this internal reference has been completely removed.

Compatibility Issue

The error indicates your code or saved models were created with an older version of scikit-learn and need to be updated to work with current versions.

Solution: Direct joblib Import

The simplest and recommended solution is to replace the deprecated import:

python
# Old way (deprecated)
from sklearn.externals import joblib

# New way (correct)
import joblib

Quick Fix

Install joblib directly if not already installed:

bash
pip install joblib

Then use:

python
import joblib
# Your existing joblib.load() and joblib.dump() calls will work

Handling Legacy Pickle Files

If you encounter errors when loading previously saved models, the issue may be that your pickle files reference the old import path. Here's how to resolve this:

Method 1: Update the Pickle File

  1. Temporarily install an older scikit-learn version (0.21.x or 0.22.x):
bash
pip install scikit-learn==0.22.2
  1. Create a migration script:
python
import sklearn.external.joblib as extjoblib
import joblib

# Load with old method
model = extjoblib.load('old_model.pkl')

# Save with new method
joblib.dump(model, 'new_model.pkl')
  1. Return to current scikit-learn version and use the updated file.

Method 2: Manual Import Workaround

For quick testing, you can create a compatibility layer:

python
try:
    from sklearn.externals import joblib
except ImportError:
    import joblib

Complete Working Example

Here's how to properly structure your code with current best practices:

python
import pandas as pd 
import numpy as np
import joblib  # Correct import
import boto3  # Better than subprocess for AWS operations
from botocore.exceptions import ClientError

def load_d2v(model_name, env='dev'):
    if env == 'dev':
        try: 
            return joblib.load(model_name)
        except FileNotFoundError:
            return download_from_s3(model_name)
    else:
        return download_from_s3(model_name)

def download_from_s3(model_name):
    s3_path = f's3://sd-flikku/datalake/doc2vec_model/{model_name}'
    
    # Using boto3 is more robust than subprocess
    s3 = boto3.resource('s3')
    bucket_name = 'sd-flikku'
    key = f'datalake/doc2vec_model/{model_name}'
    
    try:
        s3.Bucket(bucket_name).download_file(key, model_name)
        print(f'Downloaded {model_name} from S3')
        return joblib.load(model_name)
    except ClientError as e:
        print(f"Error downloading file: {e}")
        raise

AWS Security Note

When working with S3, use IAM roles and permissions instead of hardcoded credentials for better security practices.

Version Compatibility Table

scikit-learn Versionjoblib LocationStatus
< 0.21sklearn.externals.joblibDeprecated
0.21-0.22Both available (with warnings)Transitional
≥ 0.23Only import joblibCurrent standard

Additional Recommendations

  1. Update your development environment:
bash
pip install --upgrade scikit-learn joblib
  1. Check for other deprecated imports in your codebase, particularly:
python
# These may also cause issues in newer versions
from sklearn.externals import six
from sklearn.utils.validation import has_fit_parameter
  1. Test your code with different scikit-learn versions using virtual environments to ensure compatibility.

Conclusion

The ImportError occurs because scikit-learn removed the internal joblib reference in version 0.23. The solution is to:

  1. Use import joblib directly instead of the deprecated from sklearn.externals import joblib
  2. Update any legacy pickle files that reference the old import path
  3. Ensure joblib is installed as a separate dependency

By following these steps, you'll maintain compatibility with current scikit-learn versions while preserving access to your existing models.