Apache OpenOffice (AOO) Bugzilla – Issue 128551
XSLX import: missing all text
Last modified: 2023-01-06 18:42:14 UTC
I have a bunch of huge XSLX spreadsheets (100000+ rows) from a 3rd party, and they all show no text at all after opening. All the sheets are present, a picture is present, some column background are present, but there is no text, no numbers, no dates, nothing: only empty cells everywhere. Excel and LibreOffice open them perfectly. The fix for bug 126720 doesn't help here - it's a different issue. I am not allowed to distribute these files, so I am using this bug to keep my own notes while debugging OpenOffice.
Oh and Apache POI also opens them perfectly and is able to extract data just fine.
Maybe sharedStrings.xml isn't accessible for some reason, like in bug 126720, so strings can't be loaded. Nope, debugging source/xls/workbookfragment.cxx:208 like I did in bug 126720 and going a bit further along oox::xls::WorkbookFragment::finalizeImport(), then doing: ---snip--- (gdb) print getSharedStrings() ... std::vector of length 123253 ... ---snip--- so all the shared strings are loaded. I can't look at their contents though, the RichString class doesn't make that easy. But instead of looking at where data is coming in, let me look at where data is going out into the spreadsheet. Various methods on the SheetDataBuffer class look like they populate the spreadsheet. A breakpoint on "SheetDataBuffer::setStringCell" never triggers while loading this document, while it does trigger while loading other documents that do show cell contents. So the problem appears to be that some internal flow of data is broken, which is stopping the parsed data from reaching the spreadsheet.
When SheetDataBuffer::setStringCell() is called on a working document, this is the stack trace to it: ---snip--- #0 oox::xls::SheetDataBuffer::setStringCell(oox::xls::CellModel const&, int) (this=0x80dcd1d20, rModel=..., nStringId=0) at source/xls/sheetdatabuffer.cxx:373 #1 0x000000080eac9f43 in oox::xls::SheetDataContext::onEndElement() (this=0x80dfa3980) at source/xls/sheetdatacontext.cxx:223 #2 0x000000080e711c63 in oox::core::ContextHandler2Helper::implEndElement(int) (this=0x80dfa39c0, nElement=852948) at source/core/contexthandler2.cxx:120 #3 0x000000080e71220f in oox::core::ContextHandler2::endFastElement(int) (this=0x80dfa3980, nElement=852948) at source/core/contexthandler2.cxx:209 #4 0x000000080ee3f4cc in sax_fastparser::FastSaxParser::callbackEndElement(char const*) (this=0x80dbd27c0) at source/fastparser/fastparser.cxx:849 #5 0x0000000807afed94 in () at /usr/local/lib/libexpat.so.1 #6 0x0000000807afbd37 in () at /usr/local/lib/libexpat.so.1 #7 0x0000000807afaaf9 in () at /usr/local/lib/libexpat.so.1 #8 0x0000000807af7567 in () at /usr/local/lib/libexpat.so.1 #9 0x0000000807af6d8b in XML_ParseBuffer () at /usr/local/lib/libexpat.so.1 #10 0x000000080ee3e39f in sax_fastparser::FastSaxParser::parse() (this=this@entry=0x80dbd27c0) at source/fastparser/fastparser.cxx:646 ---snip--- Frame #1, oox::xls::SheetDataContext::onEndElement(), is helpful. ---snip--- 167 void SheetDataContext::onEndElement() 168 { 169 if( getCurrentElement() == XLS_TOKEN( c ) ) 170 { 171 // try to create a formula cell 172 if( mbHasFormula ) switch( maFmlaData.mnFormulaType ) 173 { ... 205 if( !mbHasFormula ) 206 { 207 // no formula created: try to set the cell value 208 if( maCellValue.getLength() > 0 ) switch( maCellData.mnCellType ) 209 { 210 case XML_n: 211 mrSheetData.setValueCell( maCellData, maCellValue.toDouble() ); 212 break; 213 case XML_b: 214 mrSheetData.setBooleanCell( maCellData, maCellValue.toDouble() != 0.0 ); 215 break; 216 case XML_e: 217 mrSheetData.setErrorCell( maCellData, maCellValue ); 218 break; 219 case XML_str: 220 mrSheetData.setStringCell( maCellData, maCellValue ); 221 break; 222 case XML_s: 223 mrSheetData.setStringCell( maCellData, maCellValue.toInt32() ); 224 break; 225 } 226 else if( (maCellData.mnCellType == XML_inlineStr) && mxInlineStr.get() ) 227 { 228 mxInlineStr->finalizeImport(); 229 mrSheetData.setStringCell( maCellData, mxInlineStr ); 230 } 231 else 232 { 233 // empty cell, update cell type 234 maCellData.mnCellType = XML_TOKEN_INVALID; 235 mrSheetData.setBlankCell( maCellData ); 236 } 237 } ---snip--- Putting a breakpoint on line 172 with the condition "maCellValue.getLength() > 0" shows it is never triggered, and stepping through shows we always end in lines 234-235, with the "empty cell, update cell type" comment. So what on earth is going wrong with maCellValue?
When I compared the data between a working file and a broken file, I immediately saw the problem. Good file: ---snip--- <sheetData> <row r="1" ht="16" customHeight="true"> <c r="B1" s="2" t="s"> <v>0</v> </c> <c r="C1" s="2" t="e"/> <c r="D1" s="2" t="e"/> <c r="E1" s="2" t="e"/> </row> ---snip--- Bad file: ---snip--- <sheetData> <row ht="87" customHeight="1"> <c s="114"/> <c s="33"/> <c s="33"/> <c s="121" t="s"> <v>53879</v> </c> <c r="G1" s="33"/> <c s="33"/> <c s="82"/> </row> ---snip--- The bad file has no r="..." attributes on its cells (<c> elements). Manually adding them gets the cells to show. This is a duplicate of bug 127672. *** This issue has been marked as a duplicate of issue 127672 ***