Fix Python – Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

Question

Asked By – HaPsantran

I have a large spreadsheet file (.xlsx) that I’m processing using python pandas. It happens that I need data from two tabs (sheets) in that large file. One of the tabs has a ton of data and the other is just a few square cells.

When I use pd.read_excel() on any worksheet, it looks to me like the whole file is loaded (not just the worksheet I’m interested in). So when I use the method twice (once for each sheet), I effectively have to suffer the whole workbook being read in twice (even though we’re only using the specified sheet).

How do I only load specific sheet(s) with pd.read_excel()?

Now we will see solution for issue: Using Pandas to pd.read_excel() for multiple worksheets of the same workbook


Answer

Try pd.ExcelFile:

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile() call (there doesn’t appear to be a way around this). This merely saves you from having to read the same file in each time you want to access a new sheet.

Note that the sheet_name argument to pd.read_excel() can be the name of the sheet (as above), an integer specifying the sheet number (eg 0, 1, etc), a list of sheet names or indices, or None. If a list is provided, it returns a dictionary where the keys are the sheet names/indices and the values are the data frames. The default is to simply return the first sheet (ie, sheet_name=0).

If None is specified, all sheets are returned, as a {sheet_name:dataframe} dictionary.

This question is answered By – Noah

This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0