Skip to content

BUG: Fixed extra blank row in to_excel for MultiIndex with empty names (#27772)#64302

Open
NicoRicardi wants to merge 9 commits intopandas-dev:mainfrom
NicoRicardi:fix-excel-multiindex-empty-row
Open

BUG: Fixed extra blank row in to_excel for MultiIndex with empty names (#27772)#64302
NicoRicardi wants to merge 9 commits intopandas-dev:mainfrom
NicoRicardi:fix-excel-multiindex-empty-row

Conversation

@NicoRicardi
Copy link

This PR resolves an issue where an additional blank row was inserted between the header and the data when exporting a DataFrame with a MultiIndex to Excel, specifically when the index levels have no names (None).

Changes:

Modified ExcelFormatter in _base.py to check if index names are actually present before incrementing the row counter for the "header-index" spacer.

Added a robust test case in test_writers.py that inspects the raw grid layout to ensure no "ghost rows" or literal "None" strings are present.

@NicoRicardi NicoRicardi force-pushed the fix-excel-multiindex-empty-row branch from c345a82 to 0956e34 Compare February 25, 2026 06:41
@NicoRicardi
Copy link
Author

hi @kjmin622, is there anything else you need from me?
As you can see the checks passed.
Best,
Nico Ricardi

@kjmin622
Copy link
Contributor

@NicoRicardi
I believe this fix should also work correctly for the code below.

import pandas as pd
from pandas import DataFrame, MultiIndex

col_mi = MultiIndex.from_tuples([("A", 1), ("A", 2)])
df = DataFrame([[10, 20]], index=[0], columns=col_mi)
df.to_excel("bug.xlsx")
result = pd.read_excel("bug.xlsx", header=None, index_col=None)
print(result)
@NicoRicardi
Copy link
Author

mnh, not fully.
Let me point out two things: as I am sure you know better than me pandas - when reading - considers the line for index names as optional, so one can write and re-read a dataframe without issues before and after my suggested changes.
Nonetheless:

  1. Someone might need to read the excel file with other tools (for instance, I am working on tools to handle files for dataset publication on Zenodo, and users may read the processed excel files with any other tool/software). As such I believe that the empty line is redundant and potentially misleading
  2. Writing to csv does not introduce this empty line if the index names are None, and I think writing to excel and csv should have as similar a behaviour as possible
  3. In your code snippet "result" and "df" have totally different headers and index
  4. technically one could reset the index and write to file, then read appropriately. But: a) fixes are better than workarounds, b) if one has a styler (for instance for bold indexes and header) the style will not be respected.

Hope this clarifies the reason for my suggestion. I understand avoiding huge code refactoring for small things, but the actual changes here are like 3/4 edited lines.

@kjmin622
Copy link
Contributor

kjmin622 commented Feb 27, 2026

@NicoRicardi It seems we might be talking past each other. Sorry I didn't explain the context upfront.

The test case I shared is specifically about exporting a DataFrame that has MultiIndex columns and a regular (single-level) index to Excel. The read_excel line is only there as a convenient way to inspect the raw sheet layout.

What I expected to see was:

>>> pd.read_excel("bug.xlsx", header=None, index_col=None)
     0    1     2
0  NaN    A   NaN
1  NaN    1   2.0
2  0.0   10  20.0

However, with your PR applied, the actual output becomes:

>>> pd.read_excel("bug.xlsx", header=None, index_col=None)
     0    1     2
0  NaN    A   NaN
1  NaN    1   2.0
2  NaN  NaN   NaN
3  0.0   10  20.0
image

That discrepancy is what I was referring to.

@NicoRicardi
Copy link
Author

NicoRicardi commented Feb 27, 2026

Hi @kjmin622,
thank you for your clarification. I initially thought you were arguing about the usefulness of the PR, sorry about that.

I have changed the code in such a way that your code snipped works as you intended, and my test checks for the behaviour for single/multi columns/indexes (all combinations but single/single).

The issue is that this breaks a few other tests. The reason for this is quite specific: there is a row with label "g" and all values being NaN. Without the empty row I removed, this is read as index name "g". The fact that the tests look at this specific case suggest that the "empty row" was actually volountary.

There is still an inconsistency between to_excel/read_excel and to_csv/read_csv: the specific case of a row of only NaN would cause a "roundtrip" failure with csv.

In my personal view, the case of a first row of only NaN is a very limit case, and the csv behaviour seems more reasonable.
Maybe some kind of parameter can be added to tune this behaviour? reading still requires some knowledge about the file and not everything can be inferred, and roundtrip cycles only work using this knowledge when reading. Pandas is trying to squeeze a much more flexible object into less flexible formats (excel/csv), and I fear that there is no perfect solution.

What is your opinion as a pandas dev?
What should we do?

  1. discard these changes, the behaviour is the indended one. Enforce the same behaviour for CSV files
  2. implement the changes but add a parameter when reading (e.g. has_index_name) to tune the behaviour. Introduce the same behaviour for CSV files
  3. Something else I cannot think of?

Thank you very much for your help.
Best,
Nico Ricardi

@kjmin622
Copy link
Contributor

First, I think this PR should stay within its original scope and be adjusted in a way that does not break existing roundtrip assumptions.
That said, if resolving this issue inevitably involves some kind of trade off, my personal view is that aligning the behavior with CSV would be the more intuitive direction. (Whichever option we choose, I think it's best to go as consistent as possible with Excel and CSV.)

However, this kind of additional behavior change is not something I can decide on my own. I think we should agree on the direction by discussing it with the maintainers(member)-either in the existing issue or by opening a new one if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants