Excel xlsx file not supported in xlrd
Problem
When attempting to read Excel files using pandas.read_excel()
with the xlrd library, you may encounter the error:
xlrd.biffh.XLRDError: Excel xlsx file; not supported
This error commonly occurs when:
- Reading .xlsx or .xlsm (macro-enabled) Excel files
- Using newer versions of xlrd (1.2.0+)
- Running code in production environments like Pivotal Cloud Foundry (PCF)
The root cause is that xlrd 2.0.0+ no longer supports .xlsx files, only the legacy .xls format.
Solutions
Recommended Solution: Use openpyxl
The optimal solution is to use the openpyxl
engine, which properly supports both .xlsx and .xlsm files:
import pandas as pd
import os
df1 = pd.read_excel(
os.path.join(APP_PATH, "Data", "aug_latest.xlsm"),
engine='openpyxl'
)
Prerequisites
Make sure you have the required packages installed:
pip install pandas openpyxl
For optimal compatibility, ensure you're using:
- pandas >= 1.0.1 (preferably the latest version)
- openpyxl >= 3.0.0
Alternative Engines
If openpyxl doesn't meet your needs, consider these alternatives:
# For Excel Binary (.xlsb) files
pip install pyxlsb
# For advanced Excel integration
pip install xlwings
Legacy Approach (Not Recommended)
Security Warning
Using xlrd 1.2.0 is not recommended due to potential security vulnerabilities. Only consider this if absolutely necessary and with proper risk assessment.
pip install xlrd==1.2.0
Best Practices
- Always specify the engine parameter when reading Excel files:
# For .xlsx files
df = pd.read_excel('file.xlsx', engine='openpyxl')
# For .xls files
df = pd.read_excel('file.xls', engine='xlrd')
- Check file extensions and use appropriate engines:
import os
filename = 'data.xlsm'
extension = os.path.splitext(filename)[1].lower()
if extension in ['.xlsx', '.xlsm']:
engine = 'openpyxl'
elif extension == '.xls':
engine = 'xlrd'
elif extension == '.xlsb':
engine = 'pyxlsb'
else:
raise ValueError(f"Unsupported file format: {extension}")
df = pd.read_excel(filename, engine=engine)
Why This Happened
The xlrd library dropped support for .xlsx files in version 2.0.0 to focus on security improvements and maintain the legacy .xls format support. This change was clearly documented in:
- The xlrd release notes
- Library documentation with prominent warnings
- PyPI project description
Production Deployment
For cloud environments like PCF, ensure your requirements.txt
includes:
pandas>=1.2.0
openpyxl>=3.0.0
Avoid pinning to older, insecure versions of xlrd, as this may introduce security risks in your production applications.
Summary
File Type | Recommended Engine | Alternative Engine |
---|---|---|
.xlsx | openpyxl | - |
.xlsm | openpyxl | - |
.xls | xlrd | - |
.xlsb | pyxlsb | - |
Always use the appropriate engine for your Excel file format and avoid deprecated xlrd versions to ensure both functionality and security in your applications.