BUG: Fixed extra blank row in to_excel for MultiIndex with empty names (#27772)#64302
BUG: Fixed extra blank row in to_excel for MultiIndex with empty names (#27772)#64302NicoRicardi wants to merge 9 commits intopandas-dev:mainfrom
Conversation
c345a82 to
0956e34
Compare
|
hi @kjmin622, is there anything else you need from me? |
|
@NicoRicardi import pandas as pd
from pandas import DataFrame, MultiIndex
col_mi = MultiIndex.from_tuples([("A", 1), ("A", 2)])
df = DataFrame([[10, 20]], index=[0], columns=col_mi)
df.to_excel("bug.xlsx")
result = pd.read_excel("bug.xlsx", header=None, index_col=None)
print(result) |
|
mnh, not fully.
Hope this clarifies the reason for my suggestion. I understand avoiding huge code refactoring for small things, but the actual changes here are like 3/4 edited lines. |
|
@NicoRicardi It seems we might be talking past each other. Sorry I didn't explain the context upfront. The test case I shared is specifically about exporting a DataFrame that has MultiIndex columns and a regular (single-level) index to Excel. The read_excel line is only there as a convenient way to inspect the raw sheet layout. What I expected to see was: >>> pd.read_excel("bug.xlsx", header=None, index_col=None)
0 1 2
0 NaN A NaN
1 NaN 1 2.0
2 0.0 10 20.0However, with your PR applied, the actual output becomes: >>> pd.read_excel("bug.xlsx", header=None, index_col=None)
0 1 2
0 NaN A NaN
1 NaN 1 2.0
2 NaN NaN NaN
3 0.0 10 20.0
That discrepancy is what I was referring to. |
…multiIndex index. Test for this included
…di/pandas into fix-excel-multiindex-empty-row
|
Hi @kjmin622, I have changed the code in such a way that your code snipped works as you intended, and my test checks for the behaviour for single/multi columns/indexes (all combinations but single/single). The issue is that this breaks a few other tests. The reason for this is quite specific: there is a row with label "g" and all values being NaN. Without the empty row I removed, this is read as index name "g". The fact that the tests look at this specific case suggest that the "empty row" was actually volountary. There is still an inconsistency between to_excel/read_excel and to_csv/read_csv: the specific case of a row of only NaN would cause a "roundtrip" failure with csv. In my personal view, the case of a first row of only NaN is a very limit case, and the csv behaviour seems more reasonable. What is your opinion as a pandas dev?
Thank you very much for your help. |
|
First, I think this PR should stay within its original scope and be adjusted in a way that does not break existing roundtrip assumptions. However, this kind of additional behavior change is not something I can decide on my own. I think we should agree on the direction by discussing it with the maintainers(member)-either in the existing issue or by opening a new one if needed. |

This PR resolves an issue where an additional blank row was inserted between the header and the data when exporting a DataFrame with a MultiIndex to Excel, specifically when the index levels have no names (None).
Changes:
Modified ExcelFormatter in _base.py to check if index names are actually present before incrementing the row counter for the "header-index" spacer.
Added a robust test case in test_writers.py that inspects the raw grid layout to ensure no "ghost rows" or literal "None" strings are present.
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.AGENTS.md. (used Gemini but with a "human in the loop" approach)