Pandas cannot open Excel (.xlsx) file
When working with Excel files in Python using pandas, you may encounter the error xlrd.biffh.XLRDError: Excel xlsx file; not supported
. This article explains why this error occurs and provides the recommended solutions.
Problem Overview
The error occurs when trying to read Excel files with the .xlsx
extension using pandas' read_excel()
function:
import pandas as pd
df = pd.read_excel('cat.xlsx')
The error message indicates that xlrd, the underlying library pandas uses to read Excel files, doesn't support .xlsx
files in its recent versions.
Root Cause
Starting with version 2.0.0, xlrd explicitly removed support for .xlsx
files due to security vulnerabilities in how it handled these files in earlier versions (1.2.0 and below). This security concern affects the default engine pandas uses for Excel file operations.
Recommended Solutions
Solution 1: Upgrade pandas and use openpyxl (Recommended)
The most secure and up-to-date approach is to:
- Ensure you have a recent version of pandas (1.2.0 or later)
- Install the
openpyxl
library:
pip install openpyxl
- Explicitly specify the engine when reading Excel files:
import pandas as pd
df = pd.read_excel('cat.xlsx', engine='openpyxl')
Solution 2: Upgrade pandas
Recent versions of pandas (≥1.2.0) have improved Excel file handling. Simply upgrading pandas may resolve the issue:
pip install --upgrade pandas
Solution 3: Using ExcelFile constructor with openpyxl
If you need to work with multiple worksheets, use the ExcelFile
constructor:
import pandas as pd
xlsx = pd.ExcelFile('cat.xlsx', engine='openpyxl')
df = pd.read_excel(xlsx, sheet_name='Sheet1')
SECURITY WARNING
Avoid downgrading xlrd to version 1.2.0 as suggested in some older answers. This version has known security vulnerabilities that could expose your system to risks when processing malicious Excel files.
Version Compatibility
The following table shows the compatibility between pandas versions and Excel file formats:
Pandas Version | .xls Support | .xlsx Support | Recommended Engine |
---|---|---|---|
< 1.0.1 | xlrd | xlrd (v1.2.0) | Not recommended |
1.0.1 - 1.1.x | xlrd | openpyxl | openpyxl |
≥ 1.2.0 | xlrd | openpyxl | openpyxl |
Best Practices
- Always specify the engine explicitly when reading Excel files to avoid unexpected behavior
- Keep pandas updated to benefit from the latest security patches and features
- Use openpyxl for .xlsx files and xlrd for legacy .xls files
- Avoid modifying pandas source code as suggested in some answers, as this creates maintenance issues
Common Use Cases
Reading multiple sheets
# Using ExcelFile for multiple operations
excel_file = pd.ExcelFile('data.xlsx', engine='openpyxl')
# Read specific sheets
df1 = pd.read_excel(excel_file, sheet_name='Sheet1')
df2 = pd.read_excel(excel_file, sheet_name='Sheet2')
Reading with additional options
# Read with specific options
df = pd.read_excel('data.xlsx',
engine='openpyxl',
sheet_name='Sheet1',
header=0,
usecols='A:C',
skiprows=1)
Troubleshooting
If you continue to experience issues:
- Verify your pandas version:
import pandas as pd
print(pd.__version__)
- Check installed dependencies:
pip list | grep -E "pandas|xlrd|openpyxl"
- Ensure the Excel file isn't corrupted by trying to open it with Excel or another spreadsheet application
By following these guidelines, you can reliably work with Excel files in pandas while maintaining security and performance standards.