Skip to content

Pandas cannot open Excel (.xlsx) file

When working with Excel files in Python using pandas, you may encounter the error xlrd.biffh.XLRDError: Excel xlsx file; not supported. This article explains why this error occurs and provides the recommended solutions.

Problem Overview

The error occurs when trying to read Excel files with the .xlsx extension using pandas' read_excel() function:

python
import pandas as pd
df = pd.read_excel('cat.xlsx')

The error message indicates that xlrd, the underlying library pandas uses to read Excel files, doesn't support .xlsx files in its recent versions.

Root Cause

Starting with version 2.0.0, xlrd explicitly removed support for .xlsx files due to security vulnerabilities in how it handled these files in earlier versions (1.2.0 and below). This security concern affects the default engine pandas uses for Excel file operations.

The most secure and up-to-date approach is to:

  1. Ensure you have a recent version of pandas (1.2.0 or later)
  2. Install the openpyxl library:
bash
pip install openpyxl
  1. Explicitly specify the engine when reading Excel files:
python
import pandas as pd
df = pd.read_excel('cat.xlsx', engine='openpyxl')

Solution 2: Upgrade pandas

Recent versions of pandas (≥1.2.0) have improved Excel file handling. Simply upgrading pandas may resolve the issue:

bash
pip install --upgrade pandas

Solution 3: Using ExcelFile constructor with openpyxl

If you need to work with multiple worksheets, use the ExcelFile constructor:

python
import pandas as pd

xlsx = pd.ExcelFile('cat.xlsx', engine='openpyxl')
df = pd.read_excel(xlsx, sheet_name='Sheet1')

SECURITY WARNING

Avoid downgrading xlrd to version 1.2.0 as suggested in some older answers. This version has known security vulnerabilities that could expose your system to risks when processing malicious Excel files.

Version Compatibility

The following table shows the compatibility between pandas versions and Excel file formats:

Pandas Version.xls Support.xlsx SupportRecommended Engine
< 1.0.1xlrdxlrd (v1.2.0)Not recommended
1.0.1 - 1.1.xxlrdopenpyxlopenpyxl
≥ 1.2.0xlrdopenpyxlopenpyxl

Best Practices

  1. Always specify the engine explicitly when reading Excel files to avoid unexpected behavior
  2. Keep pandas updated to benefit from the latest security patches and features
  3. Use openpyxl for .xlsx files and xlrd for legacy .xls files
  4. Avoid modifying pandas source code as suggested in some answers, as this creates maintenance issues

Common Use Cases

Reading multiple sheets

python
# Using ExcelFile for multiple operations
excel_file = pd.ExcelFile('data.xlsx', engine='openpyxl')

# Read specific sheets
df1 = pd.read_excel(excel_file, sheet_name='Sheet1')
df2 = pd.read_excel(excel_file, sheet_name='Sheet2')

Reading with additional options

python
# Read with specific options
df = pd.read_excel('data.xlsx', 
                   engine='openpyxl',
                   sheet_name='Sheet1',
                   header=0,
                   usecols='A:C',
                   skiprows=1)

Troubleshooting

If you continue to experience issues:

  1. Verify your pandas version:
python
import pandas as pd
print(pd.__version__)
  1. Check installed dependencies:
bash
pip list | grep -E "pandas|xlrd|openpyxl"
  1. Ensure the Excel file isn't corrupted by trying to open it with Excel or another spreadsheet application

By following these guidelines, you can reliably work with Excel files in pandas while maintaining security and performance standards.