Discussion:
Temporaries overlapped on stack
(too old to reply)
Ricardo E. Gayoso
2012-11-23 00:16:52 UTC
Permalink
Raw Message
Hi,
I found this compiler and/or STL related bug.
The command line to compile and link is the second one.
Thanks,
Ricardo

/*
OpenWatcom 1.9 crash with optimized exe (debugging exe runs ok).
Watcom 11.0c with STLport does the same.
MS VC8 does the same.

See reason in http://blogs.stonesteps.ca/showpost.aspx?pid=16

wcl386 -d2 -w4 -zq -zp1 -xs /"op st=1m" dflt_cons_ow
wcl386 -onaxet -5r -w4 -bm -bt=nt -mf -zq -zp1 -dNDEBUG -br -fp5 -oh -xs
/"op st=1m" dflt_cons_ow
*/

#include <stdio>
#include <string>
#include <vector>

using namespace std;

// Simple struct with a default constructor
struct Pepe {
Pepe() {
a = 10;
memset(b, 1, sizeof(b) );
}

int a;
char b[200000];
};

// Holder to simplify the use of new
struct Large {
vector<Pepe> v1;
};

void resize1(Large*p) // Crash !!!!
{
p->v1.resize(100);
}

void tst()
{
// Alloc large struct
Large *p = new Large;

// Resize. Crashes!!!
resize1(p);

}

int main()
{
tst();

printf( "end\n" );

return(0);
}
Marty Stanquist
2012-11-24 06:34:28 UTC
Permalink
Raw Message
I'm currently looking into this, also reviewing Andre's Blog. Would
temporarily disabling optimization for this particular module be an
acceptable workaround?

Marty

"Ricardo E. Gayoso" wrote in message news:k8mfdi$ora$***@www.openwatcom.org...

Hi,
I found this compiler and/or STL related bug.
The command line to compile and link is the second one.
Thanks,
Ricardo

/*
OpenWatcom 1.9 crash with optimized exe (debugging exe runs ok).
Watcom 11.0c with STLport does the same.
MS VC8 does the same.

See reason in http://blogs.stonesteps.ca/showpost.aspx?pid=16

wcl386 -d2 -w4 -zq -zp1 -xs /"op st=1m" dflt_cons_ow
wcl386 -onaxet -5r -w4 -bm -bt=nt -mf -zq -zp1 -dNDEBUG -br -fp5 -oh -xs
/"op st=1m" dflt_cons_ow
*/

#include <stdio>
#include <string>
#include <vector>

using namespace std;

// Simple struct with a default constructor
struct Pepe {
Pepe() {
a = 10;
memset(b, 1, sizeof(b) );
}

int a;
char b[200000];
};

// Holder to simplify the use of new
struct Large {
vector<Pepe> v1;
};

void resize1(Large*p) // Crash !!!!
{
p->v1.resize(100);
}

void tst()
{
// Alloc large struct
Large *p = new Large;

// Resize. Crashes!!!
resize1(p);

}

int main()
{
tst();

printf( "end\n" );

return(0);
}
Marty Stanquist
2012-11-24 09:04:30 UTC
Permalink
Raw Message
When compiling the program in the IDE using all defaults (no optimization)
and linker option st=1m, the program executes normally. The program also
executes normally from a Windows 7 command shell (DOS box). However, when
optimization setting -ox (average space and time) is used, the program
hangs.

void resize1(Large *p) <<-- added a space between Large and *p
{
p->v1.resize(100); <<-- executes normally, no optimization
}

Here is the IDE log file

cd C:\SwDev\Test
wmake -f C:\SwDev\Test\CppTest.mk -h -e C:\SwDev\Test\debug\CppTest.exe
cd C:\SwDev\Test\debug
wpp386
..\cpp\CppTest1.cpp -i="C:\Watcom/h;C:\Watcom/h/nt" -w4 -e25 -zq -od -d2 -6r
-bt=nt -fo=.obj -mf -xs -xr
wlink name CppTest d all sys nt op st=1m op m op maxe=25 op q op symf
@CppTest.lk1
Execution complete

You might want to try turning off optimization for now. I will continue
researching the issue addressed in the blog, especially with regard to
optimization.

Marty

"Marty Stanquist" wrote in message news:k8pppg$k8o$***@www.openwatcom.org...

I'm currently looking into this, also reviewing Andre's Blog. Would
temporarily disabling optimization for this particular module be an
acceptable workaround?

Marty

"Ricardo E. Gayoso" wrote in message news:k8mfdi$ora$***@www.openwatcom.org...

Hi,
I found this compiler and/or STL related bug.
The command line to compile and link is the second one.
Thanks,
Ricardo

/*
OpenWatcom 1.9 crash with optimized exe (debugging exe runs ok).
Watcom 11.0c with STLport does the same.
MS VC8 does the same.

See reason in http://blogs.stonesteps.ca/showpost.aspx?pid=16

wcl386 -d2 -w4 -zq -zp1 -xs /"op st=1m" dflt_cons_ow
wcl386 -onaxet -5r -w4 -bm -bt=nt -mf -zq -zp1 -dNDEBUG -br -fp5 -oh -xs
/"op st=1m" dflt_cons_ow
*/

#include <stdio>
#include <string>
#include <vector>

using namespace std;

// Simple struct with a default constructor
struct Pepe {
Pepe() {
a = 10;
memset(b, 1, sizeof(b) );
}

int a;
char b[200000];
};

// Holder to simplify the use of new
struct Large {
vector<Pepe> v1;
};

void resize1(Large*p) // Crash !!!!
{
p->v1.resize(100);
}

void tst()
{
// Alloc large struct
Large *p = new Large;

// Resize. Crashes!!!
resize1(p);

}

int main()
{
tst();

printf( "end\n" );

return(0);
}
Ricardo E. Gayoso
2012-11-26 19:02:32 UTC
Permalink
Raw Message
Changing the size of char b[] member, I found that
WC 11.0c: 34711 ok, 34712 bad
OW 1.9: 34736 ok, 34737 bad

These sizes are very close to 32Kbytes. Isn't it possible that the logic
used to inline functions and place the objects on the stack uses 16-bit
arithmentic?
(see the good article on the link)

All this seems to be related to a STL design choice that creates a temporary
object on the stack, in this case a 32K object.
So even if I use a vector<Pepe> that uses just a few bytes (all the storage
is allocated with new), the crazy resize() consumes a lot of stack.
Moreover, I don't know if after the full inlining process there may be
several temporaries allocated on the stack.
About a workaround, I switched from vector<Pepe> to vector<Pepe*>. This
reduces the size of the temporary from 32K to 4 bytes.
But I am not sure if this is just hiding the bug...
Post by Marty Stanquist
When compiling the program in the IDE using all defaults (no optimization)
and linker option st=1m, the program executes normally. The program also
executes normally from a Windows 7 command shell (DOS box). However, when
optimization setting -ox (average space and time) is used, the program
hangs.
void resize1(Large *p) <<-- added a space between Large and *p
{
p->v1.resize(100); <<-- executes normally, no optimization
}
Here is the IDE log file
cd C:\SwDev\Test
wmake -f C:\SwDev\Test\CppTest.mk -h -e C:\SwDev\Test\debug\CppTest.exe
cd C:\SwDev\Test\debug
wpp386
..\cpp\CppTest1.cpp -i="C:\Watcom/h;C:\Watcom/h/nt" -w4 -e25 -zq -od -d2 -6r
-bt=nt -fo=.obj -mf -xs -xr
wlink name CppTest d all sys nt op st=1m op m op maxe=25 op q op symf
@CppTest.lk1
Execution complete
You might want to try turning off optimization for now. I will continue
researching the issue addressed in the blog, especially with regard to
optimization.
Marty
I'm currently looking into this, also reviewing Andre's Blog. Would
temporarily disabling optimization for this particular module be an
acceptable workaround?
Marty
Hi,
I found this compiler and/or STL related bug.
The command line to compile and link is the second one.
Thanks,
Ricardo
/*
OpenWatcom 1.9 crash with optimized exe (debugging exe runs ok).
Watcom 11.0c with STLport does the same.
MS VC8 does the same.
See reason in http://blogs.stonesteps.ca/showpost.aspx?pid=16
wcl386 -d2 -w4 -zq -zp1 -xs /"op st=1m" dflt_cons_ow
wcl386 -onaxet -5r -w4 -bm -bt=nt -mf -zq -zp1 -dNDEBUG -br -fp5 -oh -xs
/"op st=1m" dflt_cons_ow
*/
#include <stdio>
#include <string>
#include <vector>
using namespace std;
// Simple struct with a default constructor
struct Pepe {
Pepe() {
a = 10;
memset(b, 1, sizeof(b) );
}
int a;
char b[200000];
};
// Holder to simplify the use of new
struct Large {
vector<Pepe> v1;
};
void resize1(Large*p) // Crash !!!!
{
p->v1.resize(100);
}
void tst()
{
// Alloc large struct
Large *p = new Large;
// Resize. Crashes!!!
resize1(p);
}
int main()
{
tst();
printf( "end\n" );
return(0);
}
Peter C. Chapin
2012-11-26 22:40:21 UTC
Permalink
Raw Message
Post by Ricardo E. Gayoso
All this seems to be related to a STL design choice that creates a temporary
object on the stack, in this case a 32K object.
When a vector is resized a temporary default constructed object of type
T has to be stored somewhere. If sizeof(T) is large a lot of memory will
be consumed for this. I guess I see two approaches

1. Isn't it possible to increase the amount of stack space available to
the program at compile (or link) time? Does that help?

2. The implementation of std::vector::resize(size_type sz) could perhaps
allocate the default constructed T on the heap instead. That creates
other complications though, so I'm not sure it's worth it. Certainly for
the case of T == int (for example), it wouldn't be.

Peter
Marty Stanquist
2012-11-27 09:30:59 UTC
Permalink
Raw Message
Our situation is a bit different than what is described in Andre's blog.
Microsoft has 2 resize methods. We just have one, the second one in the
article (see below).

<excerpt from Open Watcom vector header file>

// resize( size_type, Type )
// *************************
template< class Type, class Allocator >
void vector< Type, Allocator >::resize( size_type n, Type c )
{
if( n > vec_length )
insert( end( ),
static_cast<size_type>(n - vec_length),
static_cast<const Type &>( c ) );
else if ( n < vec_length )
erase( begin( ) + n, end( ) );
}

Additionally, the error occurs before this method actually gets called at
the assembly code level. Here is a disassembly listing for procedure resize1
showing the error at line 0065. The compile was run with optimization
flag -ox (average space and time) enabled.

Segment: _TEXT BYTE USE32 000001B5 bytes
0000 void near resize1( Large near * ):
0000 53 push ebx
0001 51 push ecx
0002 52 push edx
0003 56 push esi
0004 57 push edi
0005 55 push ebp
0006 89 E5 mov ebp,esp
0008 81 EC 7C BE 00 00 sub esp,0x0000be7c
000E L$1:
000E 89 45 F4 mov dword ptr -0xc[ebp],eax
0011 8D 85 BC A0 FF FF lea eax,-0x5f44[ebp]
0017 89 45 F8 mov dword ptr -0x8[ebp],eax
001A C7 85 BC A0 FF FF 0A 00 00 00
mov dword
ptr -0x5f44[ebp],0x0000000a
0024 BB 31 5F 00 00 mov ebx,0x00005f31
0029 BA 01 00 00 00 mov edx,0x00000001
002E 8D 85 C0 A0 FF FF lea eax,-0x5f40[ebp]
0034 E8 00 00 00 00 call memset_
0039 8D 85 BC A0 FF FF lea eax,-0x5f44[ebp]
003F 89 45 FC mov dword ptr -0x4[ebp],eax
0042 8B 75 FC mov esi,dword ptr -0x4[ebp]
0045 B9 CE 17 00 00 mov ecx,0x000017ce
004A 8D BD 84 41 FF FF lea edi,-0xbe7c[ebp]
0050 F3 A5 rep movsd
0052 B9 CE 17 00 00 mov ecx,0x000017ce
0057 81 EC 38 5F 00 00 sub esp,0x00005f38
005D 89 E7 mov edi,esp
005F 8D B5 84 41 FF FF lea esi,-0xbe7c[ebp]
0065 F3 A5 rep movsd << ACCESS VIOLATION EXCEPTION
0067 BA 64 00 00 00 mov edx,0x00000064
006C 8B 45 F4 mov eax,dword ptr -0xc[ebp]
006F E8 00 00 00 00 call void near std::vector<Pepe
near,std::allocator<Pepe near > near >::resize( int unsigned, Pepe )
0074 L$2:
0074 89 EC mov esp,ebp
0076 5D pop ebp
0077 5F pop edi
0078 5E pop esi
0079 5A pop edx
007A 59 pop ecx
007B 5B pop ebx
007C C3 ret

I'm trying to understand why for a stack size of 1M (st=1m linker option),
the allowable "b" array size is smaller than what I would have expected. I
made the following modification to struct Pepe and found that the error
occurs when the "b" array size is increased from 24368 (passes) to 24369
(fails). This result is also generated with compiler optimization flag set
to -ox (average space and time).

//#define IMAX 24368 /* pass */
#define IMAX 24369 /* fail */

// Simple struct with a default constructor
struct Pepe {
Pepe() {
a = 10;
memset(b, 1, sizeof(b) );
}

int a;
char b[IMAX];
};

I disassembled the program compiled with optimizations disabled (compiler
flag -od) and the following two lines were added to the beginning of
resize1. All the other statements were identical. This program ran normally
with IMAX set to 24369 (fail).

0000 68 D0 1D 01 00 push 0x00011dd0
0005 E8 00 00 00 00 call __CHK

Stepping through __CHK in the assembly code, it appears to detect a stack
overflow then grow the stack accordingly to correct it. This is what is
missing in the optimized code. I suspect that C++ only allocates a certain
minimal amount of stack by default and this can be increased up to a maximum
level specified by the linker. This minimum appears to be around 24K. If
your code exceeds the minimal stack requirements, under optimization it's
not clear what will catch this and correct it. This seems to be the issue
and not the resize method.

I'll try clarifying this later this week.

Marty
Peter C. Chapin
2012-11-27 12:49:38 UTC
Permalink
Raw Message
Post by Marty Stanquist
Our situation is a bit different than what is described in Andre's blog.
Microsoft has 2 resize methods. We just have one, the second one in the
article (see below).
We only have one std::vector::resize() due to my laziness. I believe if
one uses pointers to members there could be problems since there is no
actual method void resize(size_type) and only a method void
resize(size_type, Type) (where "Type" is the template parameter).

Anyway, 'c' is passed by value and so consumes stack space (when
sizeof(Type) is large). Even in void resize(size_type), if it was a
separate method, a default constructed value of type 'Type' would need
to be made and stored somewhere... probably on the stack unless special
measures were taken to do so elsewhere.
Post by Marty Stanquist
Stepping through __CHK in the assembly code, it appears to detect a
stack overflow then grow the stack accordingly to correct it. This is
what is missing in the optimized code. I suspect that C++ only allocates
a certain minimal amount of stack by default and this can be increased
up to a maximum level specified by the linker. This minimum appears to
be around 24K. If your code exceeds the minimal stack requirements,
under optimization it's not clear what will catch this and correct it.
Interesting. So it sounds like you're saying optimized C++ code can
never use a stack larger than 24K, give or take. Is that in one frame or
total? Either way it sounds like an undesirable limitation.

Peter
Uwe Schmelich
2012-11-27 15:04:28 UTC
Permalink
Raw Message
--snip--
Post by Peter C. Chapin
Post by Marty Stanquist
Stepping through __CHK in the assembly code, it appears to detect a
stack overflow then grow the stack accordingly to correct it. This is
what is missing in the optimized code. I suspect that C++ only allocates
a certain minimal amount of stack by default and this can be increased
up to a maximum level specified by the linker. This minimum appears to
be around 24K. If your code exceeds the minimal stack requirements,
under optimization it's not clear what will catch this and correct it.
Interesting. So it sounds like you're saying optimized C++ code can
never use a stack larger than 24K, give or take. Is that in one frame or
total? Either way it sounds like an undesirable limitation.
Peter
You may take a look at the 'Watcom User's Guide Help' under compiler
option -sg. There is some info about the stack guard mechanism involved
under Windows and OS/2 and about the linker STACK option and the COMMIT
directive. If you really need a large stack, you should use COMMIT to be on
the safe side and not only OP Stack. At least I had to do this in the past
to avoid large-stack crashes (without templates).
Compiler Option -st may be worth a short look too.

Uwe
Marty Stanquist
2012-11-27 19:58:28 UTC
Permalink
Raw Message
There seems to be two possible fixes for optimization:

1). Use compiler option "-sg" and linker option "op st=1m"

2) Use linker option "op st=1m com st=1m"

Both of these work.

The first option sets the maximum stack size to 1M and adds a subroutine
call to an assembly routine called __GRO to the beginning of each procedure,
including resize1. Subroutine __GRO makes the necessary stack adjustments to
accommodate the values contained in the calling parameter list.

The second option sets both the maximum stack size and the commit stack size
to 1M. According to the linker documentation for the /COMMIT directive,
the operating system allocates only a portion of the total available stack
to an application when it is initially loaded. This initial value is
specified by the linker /COMMIT directive. If /COMMIT is omitted, the
minimal stack size is set to the lesser of the value specified by the linker
/STACK directive or 64K (depending on the operating system). Without /STACK
and /COMMIT selected, I was able to verify that the program does initialize
the stack to 10,000 hex or 64K decimal as listed in the link map file.

This does explain why the linker directive st=1m by itself does not solve
the problem, but does not explain why we observed minimal stack sizes
between 24-32K during trouble shooting. All my program builds had stack
frames disabled in the IDE. I'll investigate the multiple stack frame
question this week. Going forward, I guess we should look into ways of
providing better stack information to the user during the compile and link.
I'll research this.

Marty
--snip--
Post by Peter C. Chapin
Post by Marty Stanquist
Stepping through __CHK in the assembly code, it appears to detect a
stack overflow then grow the stack accordingly to correct it. This is
what is missing in the optimized code. I suspect that C++ only allocates
a certain minimal amount of stack by default and this can be increased
up to a maximum level specified by the linker. This minimum appears to
be around 24K. If your code exceeds the minimal stack requirements,
under optimization it's not clear what will catch this and correct it.
Interesting. So it sounds like you're saying optimized C++ code can
never use a stack larger than 24K, give or take. Is that in one frame or
total? Either way it sounds like an undesirable limitation.
Peter
You may take a look at the 'Watcom User's Guide Help' under compiler
option -sg. There is some info about the stack guard mechanism involved
under Windows and OS/2 and about the linker STACK option and the COMMIT
directive. If you really need a large stack, you should use COMMIT to be on
the safe side and not only OP Stack. At least I had to do this in the past
to avoid large-stack crashes (without templates).
Compiler Option -st may be worth a short look too.

Uwe

Loading...