Discussion:
wstring code generation error in wipfc
(too old to reply)
Steven Levine
2013-05-09 05:45:06 UTC
Permalink
Raw Message
Hi,

We have identified a code generation error in wipfc. word.cpp:55
contains code of the form

if ( std::iswpunct( entity ) )
break;
else {
// 2013-05-05 SHL avoid code gen error -
deletes txt not temp wstring
# if 0
// 2013-05-05 SHL deletes txt not wstring
temporary
txt += entity;
# else
std::wstring s;
s = entity;
txt += s;
# endif

The #if 0 case is the original code that fails. The #else case is a
workaround. The original

txt += entity;

fails because the generated code deletes the txt wstring rather than
the temporary wstring built to hold the entity for append to the txt
wstring.

This code executes to process items like

M&oe;ller

in document text. There is similar code in hn.cpp, but the code is
similar to my workaround, so the generated code is correct.

My question is do we have a code generation problem in wpp386 or is
there a subtle error in the basic_string templates in the string
include? I don't do enough C++ to be completely fluent in templates
of this complexity.

This was discovered when running wipfc.exe on eCS/OS2, but I would
expect the same failure on other platforms.

Thanks,

Steven
--
---------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net>
eCS/Warp/DIY etc. www.scoug.com www.ecomstation.com
---------------------------------------------------------------------
Peter C. Chapin
2013-05-09 12:01:47 UTC
Permalink
Raw Message
Post by Steven Levine
The #if 0 case is the original code that fails. The #else case is a
workaround. The original
txt += entity;
fails because the generated code deletes the txt wstring rather than
the temporary wstring built to hold the entity for append to the txt
wstring.
Can you demonstrate this effect with a smaller and more isolated test case?

Peter
Steven Levine
2013-05-09 17:25:20 UTC
Permalink
Raw Message
On Thu, 9 May 2013 12:01:47 UTC, "Peter C. Chapin"
<***@vtc.vsc.edu> wrote:

Hi Peter,
Post by Peter C. Chapin
Can you demonstrate this effect with a smaller and more isolated test case?
Not yet. My efforts so far have been ineffective.

Steven
--
---------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net>
eCS/Warp/DIY etc. www.scoug.com www.ecomstation.com
---------------------------------------------------------------------
Paul S. Person
2013-05-10 16:39:53 UTC
Permalink
Raw Message
On Thu, 9 May 2013 17:25:20 +0000 (UTC), "Steven Levine"
Post by Steven Levine
On Thu, 9 May 2013 12:01:47 UTC, "Peter C. Chapin"
Hi Peter,
Post by Peter C. Chapin
Can you demonstrate this effect with a smaller and more isolated test case?
Not yet. My efforts so far have been ineffective.
This code executes to process items like
M&oe;ller
in document text.
which is: in the statement
Post by Steven Levine
txt += entity;
is "M&oe;ller" contained in "txt" or in "entity"? And what is the
content of the other string?

I hate getting only half the story!

(Presumably, "M&oe;ller" is intended to become "Möller" at some point
in further processing, at least when processed by some target help
compilers.)
--
"Nature must be explained in
her own terms through
the experience of our senses."
Paul S. Person
2013-05-10 16:45:37 UTC
Permalink
Raw Message
On Fri, 10 May 2013 09:39:53 -0700, Paul S. Person
Post by Paul S. Person
On Thu, 9 May 2013 17:25:20 +0000 (UTC), "Steven Levine"
Post by Steven Levine
On Thu, 9 May 2013 12:01:47 UTC, "Peter C. Chapin"
Hi Peter,
Post by Peter C. Chapin
Can you demonstrate this effect with a smaller and more isolated test case?
Not yet. My efforts so far have been ineffective.
This code executes to process items like
M&oe;ller
in document text.
which is: in the statement
Post by Steven Levine
txt += entity;
is "M&oe;ller" contained in "txt" or in "entity"? And what is the
content of the other string?
I hate getting only half the story!
My bad!
Post by Paul S. Person
(Presumably, "M&oe;ller" is intended to become "Möller" at some point
in further processing, at least when processed by some target help
compilers.)
I read "whpcvt" instead of "wipfc". There are no "target help
compilers" here!
--
"Nature must be explained in
her own terms through
the experience of our senses."
Steven Levine
2013-05-11 21:08:40 UTC
Permalink
Raw Message
Paul S. Person
2013-05-12 16:56:14 UTC
Permalink
Raw Message
On Sat, 11 May 2013 21:08:40 +0000 (UTC), "Steven Levine"
On Fri, 10 May 2013 16:39:53 UTC, Paul S. Person
Hi Paul,
Post by Paul S. Person
Post by Steven Levine
This code executes to process items like
M&oe;ller
in document text.
which is: in the statement
Post by Steven Levine
txt += entity;
is "M&oe;ller" contained in "txt" or in "entity"? And what is the
content of the other string?
Neither. Txt wstring contains M and the entity wchar_t contains ”.
Eventually the parser will append the "ller" wstring to the txt
wstring.
If I understand the problem correctly, txt ends up with "”" as the
wstring containing "M”" is deleted and the temporary kept. If so, is
this unique to "”" or does the same thing happen with each letter in
"ller"? What do you actually end up with after "M&oe;ller" has been
processed?

(I presume that "”" is the result of "&oe;"; here, it looks like a
thick "l" but presumably it looks like "ö" on your computer.)
--
"Nature must be explained in
her own terms through
the experience of our senses."
Steven Levine
2013-05-12 17:37:49 UTC
Permalink
Raw Message
Paul S. Person
2013-05-13 17:06:41 UTC
Permalink
Raw Message
On Sun, 12 May 2013 17:37:49 +0000 (UTC), "Steven Levine"
On Sun, 12 May 2013 16:56:14 UTC, Paul S. Person
Hi Paul,
If I understand the problem correctly, txt ends up with "" as the
wstring containing "M" is deleted and the temporary kept.
There's more to it than that. The heap gets corrupted because the
counts in the wstring class are out of sync with the buffer content.
Depending of the .ipf content, the output will be corrupted or wipfc
will crash.
(I presume that "" is the result of "&oe;"; here, it looks like a
thick "l" but presumably it looks like "”" on your computer.)
I suspect the "" you are seeing means you got corruption rather than
a crash.
The "" was, in fact, copied from your message as displayed here.
Presumably, it was something else when you typed it -- on your
computer. It is, in other words, some sort of font problem, but not
really anything to worry about. I only wrote the parenthesized bit in
case it wasn't clear what I was referring to. I guess I could have
been clearer.

I see from another post an indication that the problem was at least
identified (and, I hope, solved) using heapcheck, which is good.
--
"Nature must be explained in
her own terms through
the experience of our senses."
Steven Levine
2013-05-13 21:09:03 UTC
Permalink
Raw Message
On Mon, 13 May 2013 17:06:41 UTC, Paul S. Person
<***@ix.netscom.com.invalid> wrote:

Hi Paul,
Post by Paul S. Person
The "" was, in fact, copied from your message as displayed here.
Hmm. Looks like the news client was configured to post US-ASCII.
It's back to Latin-1, which is what I though it was set to. Let's see
what happens the the character translation now. &oe; is ö (unlauted
lowercase o).
Post by Paul S. Person
I see from another post an indication that the problem was at least
identified (and, I hope, solved) using heapcheck, which is good.
It is not solved. The code I posted is a workaround that avoids the
failure. As I mentioned originally, this is either a defect in the
wstring template definition or the wpp386 code generation.

The following testcase demonstrated the failure

.* entity testcase
:userdoc.
:h1 res=30000 id='Description'.
M&oe.ller
M&oe.ller
.br
:euserdoc.
.* eof

Steven
--
---------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net>
eCS/Warp/DIY etc. www.scoug.com www.ecomstation.com
---------------------------------------------------------------------
Paul S. Person
2013-05-14 17:23:06 UTC
Permalink
Raw Message
On Mon, 13 May 2013 21:09:03 +0000 (UTC), "Steven Levine"
Post by Steven Levine
On Mon, 13 May 2013 17:06:41 UTC, Paul S. Person
Hi Paul,
Post by Paul S. Person
The "" was, in fact, copied from your message as displayed here.
Hmm. Looks like the news client was configured to post US-ASCII.
It's back to Latin-1, which is what I though it was set to. Let's see
what happens the the character translation now. &oe; is ö (unlauted
lowercase o).
If you are asking whether I am seeing o-umlaut here now, the answer is
"yes".
Post by Steven Levine
Post by Paul S. Person
I see from another post an indication that the problem was at least
identified (and, I hope, solved) using heapcheck, which is good.
It is not solved. The code I posted is a workaround that avoids the
failure. As I mentioned originally, this is either a defect in the
wstring template definition or the wpp386 code generation.
I am sorry to hear that fixing the heap problem did not solve this one
as well. Or are you saying that the heap problem has been verified but
not fixed yet?
--
"Nature must be explained in
her own terms through
the experience of our senses."
Steven Levine
2013-05-14 19:33:46 UTC
Permalink
Raw Message
On Tue, 14 May 2013 17:23:06 UTC, Paul S. Person
<***@ix.netscom.com.invalid> wrote:

Hi,
Post by Paul S. Person
If you are asking whether I am seeing o-umlaut here now, the answer is
"yes".
Better.
Post by Paul S. Person
I am sorry to hear that fixing the heap problem did not solve this one
as well. Or are you saying that the heap problem has been verified but
not fixed yet?
Not quite. There is no heap problem per se. The problem is that bad
code generation in cpp386 can corrupt the heap. Sometime it bad code
just corrupts data stored in the heap. The difference is that the
former causes traps and the later causes incorrect output in the .inf
or .hlp file.

What I have is a modification for wipfc that will avoid the bad code
generation for this specific case. This will be committed to the
depot in the fullness of time.

The potential for bad code generation in wpp386 and probably still
exists and will continue to exist until me or someone else isolates
the source of the bad code generation.

Steven
--
---------------------------------------------------------------------
Steven Levine <***@earthlink.bogus.net>
eCS/Warp/DIY etc. www.scoug.com www.ecomstation.com
---------------------------------------------------------------------
Loading...