Fixing ImportError: cannot import name 'joblib' from 'sklearn.externals'

Problem Overview

When trying to import joblib from sklearn.externals, you encounter the error:

python

from sklearn.externals import joblib
ImportError: cannot import name 'joblib' from 'sklearn.externals'

This issue typically occurs with older code that was written for scikit-learn versions prior to 0.23, where joblib was available through sklearn.externals. In newer versions of scikit-learn (0.23+), this internal reference has been completely removed.

Compatibility Issue

The error indicates your code or saved models were created with an older version of scikit-learn and need to be updated to work with current versions.

Solution: Direct joblib Import

The simplest and recommended solution is to replace the deprecated import:

python

# Old way (deprecated)
from sklearn.externals import joblib

# New way (correct)
import joblib

Quick Fix

Install joblib directly if not already installed:

bash

pip install joblib

Then use:

python

import joblib
# Your existing joblib.load() and joblib.dump() calls will work

Handling Legacy Pickle Files

If you encounter errors when loading previously saved models, the issue may be that your pickle files reference the old import path. Here's how to resolve this:

Method 1: Update the Pickle File

Temporarily install an older scikit-learn version (0.21.x or 0.22.x):

bash

pip install scikit-learn==0.22.2

Create a migration script:

python

import sklearn.external.joblib as extjoblib
import joblib

# Load with old method
model = extjoblib.load('old_model.pkl')

# Save with new method
joblib.dump(model, 'new_model.pkl')

Return to current scikit-learn version and use the updated file.

Method 2: Manual Import Workaround

For quick testing, you can create a compatibility layer:

python

try:
    from sklearn.externals import joblib
except ImportError:
    import joblib

Complete Working Example

Here's how to properly structure your code with current best practices:

python

import pandas as pd 
import numpy as np
import joblib  # Correct import
import boto3  # Better than subprocess for AWS operations
from botocore.exceptions import ClientError

def load_d2v(model_name, env='dev'):
    if env == 'dev':
        try: 
            return joblib.load(model_name)
        except FileNotFoundError:
            return download_from_s3(model_name)
    else:
        return download_from_s3(model_name)

def download_from_s3(model_name):
    s3_path = f's3://sd-flikku/datalake/doc2vec_model/{model_name}'
    
    # Using boto3 is more robust than subprocess
    s3 = boto3.resource('s3')
    bucket_name = 'sd-flikku'
    key = f'datalake/doc2vec_model/{model_name}'
    
    try:
        s3.Bucket(bucket_name).download_file(key, model_name)
        print(f'Downloaded {model_name} from S3')
        return joblib.load(model_name)
    except ClientError as e:
        print(f"Error downloading file: {e}")
        raise

AWS Security Note

When working with S3, use IAM roles and permissions instead of hardcoded credentials for better security practices.

Version Compatibility Table

scikit-learn Version	joblib Location	Status
< 0.21	`sklearn.externals.joblib`	Deprecated
0.21-0.22	Both available (with warnings)	Transitional
≥ 0.23	Only `import joblib`	Current standard

Additional Recommendations

Update your development environment:

bash

pip install --upgrade scikit-learn joblib

Check for other deprecated imports in your codebase, particularly:

python

# These may also cause issues in newer versions
from sklearn.externals import six
from sklearn.utils.validation import has_fit_parameter

Test your code with different scikit-learn versions using virtual environments to ensure compatibility.

Conclusion

The ImportError occurs because scikit-learn removed the internal joblib reference in version 0.23. The solution is to:

Use import joblib directly instead of the deprecated from sklearn.externals import joblib
Update any legacy pickle files that reference the old import path
Ensure joblib is installed as a separate dependency

By following these steps, you'll maintain compatibility with current scikit-learn versions while preserving access to your existing models.

Related Posts

Fixing ImportError: cannot import name 'joblib' from 'sklearn.externals' ​

Problem Overview ​

Solution: Direct joblib Import ​

Handling Legacy Pickle Files ​

Method 1: Update the Pickle File ​

Method 2: Manual Import Workaround ​

Complete Working Example ​

Version Compatibility Table ​

Additional Recommendations ​

Conclusion ​

Fixing ImportError: cannot import name 'joblib' from 'sklearn.externals'

Problem Overview

Solution: Direct joblib Import

Handling Legacy Pickle Files

Method 1: Update the Pickle File

Method 2: Manual Import Workaround

Complete Working Example

Version Compatibility Table

Additional Recommendations

Conclusion