Apache OpenOffice (AOO) Bugzilla – Issue 71757
Opening a file (except office files) as encoded text, crashes OOo
Last modified: 2017-05-20 11:13:18 UTC
1. open a new writer document 2. insert a file with music format via "Insert|File...",such as a *.wma file 3. in the dialog box of "ASCII Filter Options", select "Unicode" for Character Set item, select "Arial" for Default fonts item, select "chinese(simplified)" for Language item 4. click the "OK" button 5. =>openoffice.org crashes same thing will happen while inserting other format of music file, such as .mp3, .ram, .rm or .swf formats.
MRU->HBRINKM: open a wmv or wma as explained with mentioned options in "Encoded text" dialog -> OO will end without an error message.
target 3.0
taking over
Seems to be heap corruption caused by endless loop (at least in my try with an mp3 file). I get a lot of assertions "What a guess!".
*** Issue 97575 has been marked as a duplicate of this issue. ***
Andreas, are we on track for 3.1 with this issue? Regards, KP.
Insert a pdf file, can also induce the same problem. Because opening a file (except office files) can make it
The issue has been reprodused on PC, WIN XP on version DEV300m37.
ama->KP: no, we are not on track for OOo3.1 Due to our workload and resources (development as well as QA) I've to retarget this issue to OOo3.2.
Taking over
I think that you can always bring OOo to crash or loop by insering useless content as "text". The question is, how to deal with it? How can we detect if something is text or just a bunch of bytes, especially if you don't know the encoding. Some options: - repeat type detection after the user has entered an encoding and then reject all files that still contain zero bytes or have lines with more than n characters - try a language guessing and reject all files that can't be detected; this way files could become rejected just because we don't check for their language, though we check for a lot of them - the assertions show us that at least deep below, in the text formatter, we detect that something is wrong, so at least here we could stop. But it seems that this is very late (insertion already done) and it might be tricky to recover from the detected error without creating new problems. The first two options have the advantage that they try to reject files before they are actually inserted So far I think the first option is the best one to start with.
IMHO, fixing the code that crashes is the most viable option. Trying to avoid passing "bad" data to code that would crash on it is an band- aid approach.
Andreas, Frank, Oliver, do you see any chance to recover from the situation? It seems that the TextGuess is confused and OOo fails to recognize portions in the text. The result is unpredictable, and basically it is impossible to format such "text" at all. IMHO the only way to prevent that is detecting the error as early as possible and discard this "document". If it isn't possible before the "text" is inserted, we need a health check or something similar that can find out fast and easily if the "text" can be formatted at all.
So here's a proposal. We will extend the filter dialog with a preview, like in Calc where you always can see how the first few rows will look with the current settings. We will have a text engine in that dialog that displays the first few thousand characters of the document using the current settings. This will enable us to test the "text" in a "sandbox". Even if the user presses "OK" though he just sees garbage, we can still run a detection over the previewed text and reject it in case it does not match our criteria. Once the text is in the Writer core it is close to impossible to handle all the problems that might appear deep inside the text formatting or the VCL text output caused by text garbage.
Reset assigne to the default "issues@openoffice.apache.org".