Discussion:
Bug in C compiler, not C++, version 1.90
(too old to reply)
Chip
2012-07-21 01:12:03 UTC
Permalink
I have not tested earlier versions, but the following code does not build
correctly with wcc or wcc386. The two functions _should_ compile to be
identical, but the first function is compiled as if the "flag = 1;"
statement is part of the while(). I compiled the code with wpp and wpp386,
both produced correct code.

Chip

-----------------------------------------

#if defined(__WATCOMC__) && defined(__cplusplus)
# if defined(__386__)
# pragma aux _pause = "pause" modify []
# else
# pragma aux _pause = "db 0xF3, 0x90" modify []
# endif
#else
# if defined(__386__)
# define _pause() __asm pause
# else
# define _pause() __asm _emit 0xF3 __asm _emit 0x90
# endif
#endif

extern void __cdecl _xchg( void * pLocation1, void * pLocation2, unsigned
uByteCount );
extern void __cdecl _clflush( void * pAddress );

void __cdecl GetSemaphore( char * pLock )
{
char flag;

do
{
while ( *pLock != 0 )
_pause();

flag = 1;
_xchg( pLock, &flag, 1 ); /* issue "xchg" on the one byte
semaphore */
} while ( flag == 1 );
_clflush( pLock ); /* flush the lock from this thread/CPU
cache */
}

void __cdecl GetSemaphore2( char * pLock )
{
char flag;

do
{
while ( *pLock != 0 )
{
_pause();
}
flag = 1;
_xchg( pLock, &flag, 1 ); /* issue "xchg" on the one byte
semaphore */
} while ( flag == 1 );
_clflush( pLock ); /* flush the lock from this thread/CPU
cache */
}

-----------------------------------------

Module: M:\semaphore1\semaphore.c
GROUP: 'DGROUP' CONST,CONST2,_DATA

Segment: _TEXT BYTE USE32 00000085 bytes
0000 _GetSemaphore:
0000 68 24 00 00 00 push 0x00000024
0005 E8 00 00 00 00 call __CHK
000A 53 push ebx
000B 56 push esi
000C 57 push edi
000D 55 push ebp
000E 83 EC 04 sub esp,0x00000004
0011 8B 6C 24 18 mov ebp,dword ptr 0x18[esp]
0015 L$1:
0015 80 7D 00 00 cmp byte ptr [ebp],0x00
0019 74 08 je L$2
001B F3 90 pause
001D C6 04 24 01 mov byte ptr [esp],0x01
0021 EB F2 jmp L$1
0023 L$2:
0023 6A 01 push 0x00000001
0025 8D 44 24 04 lea eax,0x4[esp]
0029 50 push eax
002A 55 push ebp
002B E8 00 00 00 00 call __xchg
0030 83 C4 0C add esp,0x0000000c
0033 80 3C 24 01 cmp byte ptr [esp],0x01
0037 74 DC je L$1
0039 L$3:
0039 55 push ebp
003A E8 00 00 00 00 call __clflush
003F 83 C4 04 add esp,0x00000004
0042 83 C4 04 add esp,0x00000004
0045 5D pop ebp
0046 5F pop edi
0047 5E pop esi
0048 5B pop ebx
0049 C3 ret

Routine Size: 74 bytes, Routine Base: _TEXT + 0000

004A _GetSemaphore2:
004A 68 24 00 00 00 push 0x00000024
004F E8 00 00 00 00 call __CHK
0054 53 push ebx
0055 56 push esi
0056 57 push edi
0057 55 push ebp
0058 83 EC 04 sub esp,0x00000004
005B 8B 6C 24 18 mov ebp,dword ptr 0x18[esp]
005F L$4:
005F 80 7D 00 00 cmp byte ptr [ebp],0x00
0063 74 04 je L$5
0065 F3 90 pause
0067 EB F6 jmp L$4
0069 L$5:
0069 C6 04 24 01 mov byte ptr [esp],0x01
006D 6A 01 push 0x00000001
006F 8D 44 24 04 lea eax,0x4[esp]
0073 50 push eax
0074 55 push ebp
0075 E8 00 00 00 00 call __xchg
007A 83 C4 0C add esp,0x0000000c
007D 80 3C 24 01 cmp byte ptr [esp],0x01
0081 74 DC je L$4
0083 EB B4 jmp L$3

Routine Size: 59 bytes, Routine Base: _TEXT + 004A

No disassembly errors

Segment: CONST DWORD USE32 00000000 bytes

Segment: CONST2 DWORD USE32 00000000 bytes

Segment: _DATA DWORD USE32 00000000 bytes
Hans-Bernhard Bröker
2012-07-21 21:35:01 UTC
Permalink
The two functions _should_ compile to be identical,
Based on what documentation did you arrive at that expectation?

For starters, the way you use __asm fails to match the OW documentation
(which always shows a { } block following this keyword, and there is no
mention of __emit being a supported pseudo). What you should be
surprised about is thus that this does anything even remotely what
resembling what you wanted.

Second, the macro _pause in the x86 C version of your code looks
insufficiently parenthesized, which should make you _very_ wary using it
the way you did it in your first example function.
but the first function is compiled as if the "flag =
1;" statement is part of the while().
I'm afraid you've invoked undefined behaviour by that point, by using
undocumented syntax. So: anything could happen.

In a nutshell, that while loop doesn't necessarily terminate where you
think it does, because that semicolon following _pause may not actually
be the C statement separator you think it is.
I compiled the code with wpp and wpp386, both produced correct code.
That's a rather misleading statement, because your code itself is
different when compiled by WPP than it is for WCC, by way of the
__cplusplus macro.

Which, BTW, begs a question: you used _pragma aux style inline assembly
for the C++ version of your code --- so why not for the C version?
Chip
2012-07-24 05:34:27 UTC
Permalink
Post by Hans-Bernhard Bröker
The two functions _should_ compile to be identical,
Based on what documentation did you arrive at that expectation?
For starters, the way you use __asm fails to match the OW documentation
(which always shows a { } block following this keyword, and there is no
mention of __emit being a supported pseudo). What you should be surprised
about is thus that this does anything even remotely what resembling what
you wanted.
Second, the macro _pause in the x86 C version of your code looks
insufficiently parenthesized, which should make you _very_ wary using it
the way you did it in your first example function.
For references to _emit with __asm and without using curly braces, please
check this page on the OpenWatcom website:
http://www.openwatcom.org:4000/@md=d&cd=//&cdf=//depot/ow_devel/zdos/bench/support/timer.c&ra=s&c=Ytn@//depot/ow_devel/zdos/bench/support/timer.c?ac=64


The C compiler behaves differently than C++ with respect to #pragma
statements. I wrote the code so that it will compile.
Jiri Malak
2012-07-24 09:48:37 UTC
Permalink
Yes, it is compiler bug.
If you are interested in fixed version of OW then you can get it from
git://github.com/jmalak/open-watcom.git

Jiri
Post by Chip
Post by Hans-Bernhard Bröker
The two functions _should_ compile to be identical,
Based on what documentation did you arrive at that expectation?
For starters, the way you use __asm fails to match the OW documentation
(which always shows a { } block following this keyword, and there is no
mention of __emit being a supported pseudo). What you should be
surprised about is thus that this does anything even remotely what
resembling what you wanted.
Second, the macro _pause in the x86 C version of your code looks
insufficiently parenthesized, which should make you _very_ wary using it
the way you did it in your first example function.
For references to _emit with __asm and without using curly braces, please
The C compiler behaves differently than C++ with respect to #pragma
statements. I wrote the code so that it will compile.
Hans-Bernhard Bröker
2012-07-24 18:27:51 UTC
Permalink
Post by Jiri Malak
Yes, it is compiler bug.
If you are interested in fixed version of OW then you can get it from
git://github.com/jmalak/open-watcom.git
Any particular reason you couldn't put it into the real, P4 repository?
Jiri Malak
2012-07-25 06:55:05 UTC
Permalink
Post by Hans-Bernhard Bröker
Post by Jiri Malak
Yes, it is compiler bug.
If you are interested in fixed version of OW then you can get it from
git://github.com/jmalak/open-watcom.git
Any particular reason you couldn't put it into the real, P4 repository?
Yes, I was again atack by Mr.Necasek with the knowledge of Marty Stanquist.
I don't want to continue with Perforce because it enable these guyes to remove access to code line
anytime and to anybody.
It was done in the past for several people without any discussion and without any rules etc.
I don't want to participate on project with similar practise.
OW project is marketed by an intolerance Mr. Necasek to different view then his own.

It is reason why I did fork of OW to Git to enable realy "Open" development without any similar
practise.

Anybody with any knowledge and experiencies is welcome to participate on this fork.

Jiri
Leif Ekblad
2012-07-25 18:52:24 UTC
Permalink
Post by Jiri Malak
Post by Hans-Bernhard Bröker
Post by Jiri Malak
Yes, it is compiler bug.
If you are interested in fixed version of OW then you can get it from
git://github.com/jmalak/open-watcom.git
Any particular reason you couldn't put it into the real, P4 repository?
Yes, I was again atack by Mr.Necasek with the knowledge of Marty Stanquist.
I don't want to continue with Perforce because it enable these guyes to
remove access to code line anytime and to anybody.
It was done in the past for several people without any discussion and
without any rules etc.
I don't want to participate on project with similar practise.
OW project is marketed by an intolerance Mr. Necasek to different view then his own.
It is reason why I did fork of OW to Git to enable realy "Open"
development without any similar practise.
Anybody with any knowledge and experiencies is welcome to participate on this fork.
Jiri
I'm sorry to hear that. The OW project doesn't need these kinds of
conflicts, especially
not when it is one of the few people that contributes a lot to the project
that is involved.

Leif Ekblad

Hans-Bernhard Bröker
2012-07-24 18:15:35 UTC
Permalink
Post by Chip
For references to _emit with __asm and without using curly braces,
That's just some code someone else wrote ignoring the documentation.
That's not a reference to anything.
Post by Chip
The C compiler behaves differently than C++ with respect to #pragma
statements. I wrote the code so that it will compile.
You missed my point: in the case at hand there was no need for that.
The C compiler would have accepted that #pragma aux code just fine, and
done the right thing. Plus it would have had the advantage of being
covered by documented behaviour.
Marek Vondrak
2012-07-21 23:23:38 UTC
Permalink
Well, unless you use fences any threading code will just rely on the
particular internal workings of a given version of a compiler. From the
point of view of the compiler, it does not matter whether the "flag = 1"
statement happens inside the while loop or outside, because the
difference can not be observed by anyone who reads "flag" using the
regular read operations. If you want to synchronize between
cores/CPUs, you need to use fences and you need a proper compiler
support for it (google for memory model for C++ and C and
why compiler optimizers need to know about existence of threads).

-Marek
Leif Ekblad
2012-07-22 07:38:49 UTC
Permalink
The correct way to write the code would be to code it all in assembly, as
this
would not be dependent on optimizing. Fences are not strictly needed, but
they could optimize performance.

The code presented seems to be some kind of spinlock, which
I implement like this (asm code):

lockloop:
mov eax,lock_var
or eax,eax
jz getlock
;
pause
jmp lockloop

getlock:
inc eax
xchg eax,lock_var
or eax,eax
jnz lockloop

Even if this code works as a semaphore, it has some problems, especially
if the semaphore is busy a lot as then many CPUs in the system will be
executing the lock-loop rather than some useful code.


Leif Ekblad
Post by Marek Vondrak
Well, unless you use fences any threading code will just rely on the
particular internal workings of a given version of a compiler. From the
point of view of the compiler, it does not matter whether the "flag = 1"
statement happens inside the while loop or outside, because the difference
can not be observed by anyone who reads "flag" using the regular read
operations. If you want to synchronize between cores/CPUs, you need to use
fences and you need a proper compiler support for it (google for memory
model for C++ and C and
why compiler optimizers need to know about existence of threads).
-Marek
Bartosz Polednia
2012-07-22 14:40:39 UTC
Permalink
Post by Leif Ekblad
The correct way to write the code would be to code it all in assembly, as
[ciach-ciach]
Post by Leif Ekblad
Even if this code works as a semaphore, it has some problems, especially
if the semaphore is busy a lot as then many CPUs in the system will be
executing the lock-loop rather than some useful code.
Leif Ekblad
You should use lock prefix to have exclusive access to memory.
Explanation here:
http://web.cs.du.edu/~dconnors/courses/comp3361/notes/lock.pdf

Bartosz.
Leif Ekblad
2012-07-22 18:10:21 UTC
Permalink
Post by Bartosz Polednia
Post by Leif Ekblad
The correct way to write the code would be to code it all in assembly, as
[ciach-ciach]
Post by Leif Ekblad
Even if this code works as a semaphore, it has some problems, especially
if the semaphore is busy a lot as then many CPUs in the system will be
executing the lock-loop rather than some useful code.
Leif Ekblad
You should use lock prefix to have exclusive access to memory.
http://web.cs.du.edu/~dconnors/courses/comp3361/notes/lock.pdf
Bartosz.
The xchg instruction doesn't need an explicit lock since it is always
locked.
The cmpxchg instruction does need a lock, but it is not available on 386,
so I don't use it.

Leif Ekblad
Chip
2012-07-24 05:36:48 UTC
Permalink
Post by Marek Vondrak
Well, unless you use fences any threading code will just rely on the
particular internal workings of a given version of a compiler. From the
point of view of the compiler, it does not matter whether the "flag = 1"
statement happens inside the while loop or outside, because the difference
can not be observed by anyone who reads "flag" using the regular read
operations. If you want to synchronize between cores/CPUs, you need to use
fences and you need a proper compiler support for it (google for memory
model for C++ and C and why compiler optimizers need to know about
existence of threads).
Marek,

What you don't see in the snippet provided is all the other MP code that is
written in a combination of C, C++, and Assembly. None of that other stuff
is relevant to the reporting of a compiler bug.

Chip
Loading...