Excel xlsx file not supported in xlrd
Problem
When attempting to read Excel files using pandas.read_excel() with the xlrd library, you may encounter the error:
xlrd.biffh.XLRDError: Excel xlsx file; not supportedThis error commonly occurs when:
- Reading .xlsx or .xlsm (macro-enabled) Excel files
- Using newer versions of xlrd (1.2.0+)
- Running code in production environments like Pivotal Cloud Foundry (PCF)
The root cause is that xlrd 2.0.0+ no longer supports .xlsx files, only the legacy .xls format.
Solutions
Recommended Solution: Use openpyxl
The optimal solution is to use the openpyxl engine, which properly supports both .xlsx and .xlsm files:
import pandas as pd
import os
df1 = pd.read_excel(
os.path.join(APP_PATH, "Data", "aug_latest.xlsm"),
engine='openpyxl'
)Prerequisites
Make sure you have the required packages installed:
pip install pandas openpyxlFor optimal compatibility, ensure you're using:
- pandas >= 1.0.1 (preferably the latest version)
- openpyxl >= 3.0.0
Alternative Engines
If openpyxl doesn't meet your needs, consider these alternatives:
# For Excel Binary (.xlsb) files
pip install pyxlsb# For advanced Excel integration
pip install xlwingsLegacy Approach (Not Recommended)
Security Warning
Using xlrd 1.2.0 is not recommended due to potential security vulnerabilities. Only consider this if absolutely necessary and with proper risk assessment.
pip install xlrd==1.2.0Best Practices
- Always specify the engine parameter when reading Excel files:
# For .xlsx files
df = pd.read_excel('file.xlsx', engine='openpyxl')
# For .xls files
df = pd.read_excel('file.xls', engine='xlrd')- Check file extensions and use appropriate engines:
import os
filename = 'data.xlsm'
extension = os.path.splitext(filename)[1].lower()
if extension in ['.xlsx', '.xlsm']:
engine = 'openpyxl'
elif extension == '.xls':
engine = 'xlrd'
elif extension == '.xlsb':
engine = 'pyxlsb'
else:
raise ValueError(f"Unsupported file format: {extension}")
df = pd.read_excel(filename, engine=engine)Why This Happened
The xlrd library dropped support for .xlsx files in version 2.0.0 to focus on security improvements and maintain the legacy .xls format support. This change was clearly documented in:
- The xlrd release notes
- Library documentation with prominent warnings
- PyPI project description
Production Deployment
For cloud environments like PCF, ensure your requirements.txt includes:
pandas>=1.2.0
openpyxl>=3.0.0Avoid pinning to older, insecure versions of xlrd, as this may introduce security risks in your production applications.
Summary
| File Type | Recommended Engine | Alternative Engine |
|---|---|---|
| .xlsx | openpyxl | - |
| .xlsm | openpyxl | - |
| .xls | xlrd | - |
| .xlsb | pyxlsb | - |
Always use the appropriate engine for your Excel file format and avoid deprecated xlrd versions to ensure both functionality and security in your applications.