Introduction
Python has a wealthy ecosystem of libraries that make it a perfect language for knowledge evaluation. A kind of libraries is pandas
, which simplifies the method of studying and writing knowledge between in-memory knowledge constructions and totally different file codecs.
Nonetheless, whereas working with Excel recordsdata utilizing pandas.read_excel
, you would possibly run into an error that appears like this:
xlrd.biffh.XLRDError: Excel xlsx file; not supported
On this Byte, we’ll dissect this error message, perceive why it happens, and discover ways to repair it.
What’s the Error “xlrd.biffh.XLRDError”
The xlrd.biffh.XLRDError
is a selected error message that you just would possibly encounter whereas working with the pandas
library in Python. This error is thrown whenever you attempt to learn an Excel file with the .xlsx
extension utilizing pandas.read_excel
methodology.
Here is an instance of the error:
import pandas as pd
df = pd.read_excel('file.xlsx')
Output:
xlrd.biffh.XLRDError: Excel xlsx file; not supported
Explanation for the Error
The xlrd.biffh.XLRDError
error is brought on by a latest change within the xlrd
library that pandas
makes use of to learn Excel recordsdata. The xlrd
library now solely helps the older .xls
file format and now not helps the newer .xlsx
file format.
This alteration is usually a little bit of a shock in the event you’ve been utilizing pandas.read_excel
with xlrd
. By default, pandas.read_excel
makes use of the xlrd
library to learn Excel recordsdata, however as of xlrd
model 2.0.0, this library now not helps .xlsx
recordsdata.
As builders, we have all been there…
Learn how to Repair the Error
The answer to this error is straightforward. You simply want to put in openpyxl
and specify the engine
argument within the pandas.read_excel
methodology to make use of the openpyxl
library as an alternative of xlrd
. The openpyxl
library helps each .xls
and .xlsx
file codecs.
Here is do it:
First, you’ll want to set up the openpyxl
library. You are able to do this utilizing pip:
$ pip set up openpyxl
Then, you possibly can specify the engine
argument within the pandas.read_excel
methodology like this:
import pandas as pd
df = pd.read_excel('file.xlsx', engine='openpyxl')
This code will learn the Excel file utilizing the openpyxl
library, and you’ll now not encounter the xlrd.biffh.XLRDError
error.
Conclusion
On this Byte, we have discovered in regards to the xlrd.biffh.XLRDError
error that occurs when utilizing pandas.read_excel
to learn .xlsx
recordsdata. We have discovered why this error happens and repair it through the use of the openpyxl
library.