Discussion:
https://github.com/bartg/langs/tree/master/bccproj
Add Reply
jacobnavia
2017-04-24 21:33:27 UTC
Reply
Permalink
Raw Message
Hi Bart

Got your compiler, looks impressive. Tried to compile it and got a
problem with

bartcc.c:1236:15: error: conflicting types for 'strmode'
char * strmode (int32,int32);
^
/usr/include/string.h:164:7: note: previous declaration is here
void strmode(int, char *);

What is that strmode?

I did not know that function.
bartc
2017-04-24 22:14:46 UTC
Reply
Permalink
Raw Message
Post by jacobnavia
Hi Bart
Got your compiler, looks impressive. Tried to compile it and got a
problem with
bartcc.c:1236:15: error: conflicting types for 'strmode'
char * strmode (int32,int32);
^
/usr/include/string.h:164:7: note: previous declaration is here
void strmode(int, char *);
What is that strmode?
I did not know that function.
OK, strmode() is one of my functions (convert a type, or 'mode', into
string). Presumably it clashes with a C library function called
'strmode' (although I didn't see the problem with Windows compilers, nor
gcc on Linux).

I guess a quick fix is to rename 'strmode' to something else throughput
the file. And I will rename that function in the meantime. (In my source
language, str- names are not reserved.)

(Note: if using lccwin to compile this one-file version, I've
experienced problems with lccwin64 in running the result. I think
lccwin32 is OK (you need to compile mcc32.c). I will have to look at
that in more detail to see if it's a bug that only comes up with
lccwin64, or if it's an issue with lccwin64.)
--
bartc
bartc
2017-04-24 23:31:29 UTC
Reply
Permalink
Raw Message
Post by bartc
(Note: if using lccwin to compile this one-file version, I've
experienced problems with lccwin64 in running the result. I think
lccwin32 is OK (you need to compile mcc32.c). I will have to look at
that in more detail to see if it's a bug that only comes up with
lccwin64, or if it's an issue with lccwin64.)
Sorry, I think it's a bug in lccwin64. I reduced it down to this (from
17000 lines):

#include <stdio.h>
#include <stddef.h>

struct _tokenrec {
union {
double xvalue;
char * svalue;
};
struct _tokenrec* nexttoken;
struct {
char subcode;
};
};

int main(void) {
printf("SIZE: %d\n",sizeof(struct _tokenrec));
printf("OFFSET: %d\n",offsetof(struct _tokenrec,subcode));
}


The correct output should be a size of 24 bytes, and an offset for
member 'subcode' of 16.

lccwin64 give the offset as 32. (Version 4.1 27-10-16 which appears to
be the latest.)
--
bartc
Scott Lurndal
2017-04-25 14:46:05 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by jacobnavia
Hi Bart
Got your compiler, looks impressive. Tried to compile it and got a
problem with
bartcc.c:1236:15: error: conflicting types for 'strmode'
char * strmode (int32,int32);
^
/usr/include/string.h:164:7: note: previous declaration is here
void strmode(int, char *);
What is that strmode?
I did not know that function.
OK, strmode() is one of my functions (convert a type, or 'mode', into
string). Presumably it clashes with a C library function called
'strmode' (although I didn't see the problem with Windows compilers, nor
gcc on Linux).
If you're adding functions to the implementation that aren't defined
by one of the relevent standards, should you not be prepending at least
one underscore to the name? Otherwise, you'll likely break existing
programs (as per above).
bartc
2017-04-25 16:05:02 UTC
Reply
Permalink
Raw Message
Post by Scott Lurndal
Post by bartc
Post by jacobnavia
Hi Bart
Got your compiler, looks impressive. Tried to compile it and got a
problem with
bartcc.c:1236:15: error: conflicting types for 'strmode'
char * strmode (int32,int32);
^
/usr/include/string.h:164:7: note: previous declaration is here
void strmode(int, char *);
What is that strmode?
I did not know that function.
OK, strmode() is one of my functions (convert a type, or 'mode', into
string). Presumably it clashes with a C library function called
'strmode' (although I didn't see the problem with Windows compilers, nor
gcc on Linux).
If you're adding functions to the implementation that aren't defined
by one of the relevent standards, should you not be prepending at least
one underscore to the name? Otherwise, you'll likely break existing
programs (as per above).
This is actually similar to the problem in the other thread about
avoiding "$" in names of C identifiers because some C compilers
(purportedly because of assembler or linker limitations) don't allow them.

In this case it's because the translator from a source language (not C)
to C has to be aware of C's rather bizarre restriction that the prefix
'str-' is reserved and can't be used for user identifiers. (So 'weak' is
OK, but not 'strong'; presumably not even 'string' is legal.)

This requires that the translator somehow escapes an 'str-' prefix.
(Into $str_ might have been a good choice, but...)

However, it was easier, for this one-off problem, to just rename strmode
to Strmode in the original source.

(It's still a puzzle why I didn't see the clash on my Linux.)
--
bartc
Ben Bacarisse
2017-04-26 00:32:45 UTC
Reply
Permalink
Raw Message
<snip>
Post by bartc
Post by bartc
OK, strmode() is one of my functions (convert a type, or 'mode', into
string). Presumably it clashes with a C library function called
'strmode' (although I didn't see the problem with Windows compilers, nor
gcc on Linux).
<snip>
Post by bartc
(It's still a puzzle why I didn't see the clash on my Linux.)
Didn't "man strmode" solve the puzzle for you? (I'd have written
strmode(3) except that notation seems to be almost unknown these days.)
--
Ben.
bartc
2017-04-26 10:32:05 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by bartc
(It's still a puzzle why I didn't see the clash on my Linux.)
Didn't "man strmode" solve the puzzle for you? (I'd have written
strmode(3) except that notation seems to be almost unknown these days.)
Well, the answer is that 'strmode()' doesn't exist on my Linux.

Otherwise I would have noticed the clash in my tests and done something
about it.

(The puzzle then becomes why strmode() isn't there; maybe it's only on
FreeBSD or something because I first tried 'man strmode' online and it
was on a site for FreeBSD. I think jacob uses a Mac so that could be
another difference.)
--
bartc
Ben Bacarisse
2017-04-26 11:09:08 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
Post by bartc
(It's still a puzzle why I didn't see the clash on my Linux.)
Didn't "man strmode" solve the puzzle for you? (I'd have written
strmode(3) except that notation seems to be almost unknown these days.)
Well, the answer is that 'strmode()' doesn't exist on my Linux.
Otherwise I would have noticed the clash in my tests and done
something about it.
Not necessarily. strmode is there on my Linux system but it needs
bsd/string.h to be included so you would not notice a clash despite it
being there. (That's why I suggested using "man" -- I though you were
puzzled about why you got not conflict despite it being on your system.)
Post by bartc
(The puzzle then becomes why strmode() isn't there;
Again, I'm not sure what the puzzle is. Why should it be there if
you've not put it there? It's on my system because I've installed the
"dev" version of libbsd (the version you need to write code using libbsd
rather than simply linking against it).
Post by bartc
maybe it's only on
FreeBSD or something because I first tried 'man strmode' online and it
was on a site for FreeBSD. I think jacob uses a Mac so that could be
another difference.)
On BSD and BSD-derived systems it is likely to declared in string.h but
I would hope that the declaration would be hidden when compiling with
flags that ask for (some version of) standard C conformance. Because
it's a reserved name, the compiler does not have to hide it in
conforming mode, but it's very useful to be able to do that to check
code for non-standard dependencies.
--
Ben.
Tim Rentsch
2017-04-27 21:32:56 UTC
Reply
Permalink
Raw Message
[...concerning a declaration for strmode() in <string.h>...]
On BSD and BSD-derived systems it is likely to declared in string.h but
I would hope that the declaration would be hidden when compiling with
flags that ask for (some version of) standard C conformance. Because
its a reserved name, the compiler does not have to hide it in
conforming mode, but its very useful to be able to do that to check
code for non-standard dependencies.
This idea strikes me as somewhat strange. Not because I think
it's useless, but because it seems like it would only very rarely
be useful. What I think would be more useful is a compiler-like
program that simply checks whether any such symbols are used in
the program source (as opposed to being in standard header files,
where they are allowed). Also the results are the same across
different implementations, so it could be written basically just
once (there are things like -I flags to worry about, but those
shouldn't be hard to provide in a general way).
Thiago Adams
2017-04-28 13:23:39 UTC
Reply
Permalink
Raw Message
Bart,
I notice the source of your program is in your other language and then
you translate to C.

There is one C++ compiler (comeau), apart of CFront, that generates C code.
I am curious about the output. How they managed the generation etc.., but
I can't find samples, our trial download etc.

Have you heard about comeau?

Does anyone have used this compiler?
http://www.comeaucomputing.com/

I can imagine many ways of translate to C, but I would like to
see how others do.

I found some details about how it was done in CFront as well.

ftp://public.dhe.ibm.com/software/rational/docs/docset/doc/cpf_4.2/ccase_ux/ccbuild/ccbuild-66.html

The most interesting part is that both compiler have templates.
The code generation for "automatic instantiation" is very interesting.

Are you planning to add more features in your language that translates do C?

Does your current C compiler do everything in one pass? (and how about the compiler that translate to C?)

Do you have function overload in your other language?
Scott Lurndal
2017-04-28 14:26:13 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Bart,
I notice the source of your program is in your other language and then
you translate to C.
There is one C++ compiler (comeau), apart of CFront, that generates C code.
I am curious about the output. How they managed the generation etc.., but
I can't find samples, our trial download etc.
I found some details about how it was done in CFront as well.
ftp://public.dhe.ibm.com/software/rational/docs/docset/doc/cpf_4.2/ccase_ux/ccbuild/ccbuild-66.html
The most interesting part is that both compiler have templates.
The code generation for "automatic instantiation" is very interesting.
To be fair, cfront didn't have template support until version 3.0. And
it was not particularly usable, at least in the environments for which we
were using Cfront 2.1 (Operating Systems), as it ballooned the code
footprint enormously.
Thiago Adams
2017-04-28 18:04:18 UTC
Reply
Permalink
Raw Message
Post by Scott Lurndal
Post by Thiago Adams
Bart,
I notice the source of your program is in your other language and then
you translate to C.
There is one C++ compiler (comeau), apart of CFront, that generates C code.
I am curious about the output. How they managed the generation etc.., but
I can't find samples, our trial download etc.
I found some details about how it was done in CFront as well.
ftp://public.dhe.ibm.com/software/rational/docs/docset/doc/cpf_4.2/ccase_ux/ccbuild/ccbuild-66.html
The most interesting part is that both compiler have templates.
The code generation for "automatic instantiation" is very interesting.
To be fair, cfront didn't have template support until version 3.0. And
it was not particularly usable, at least in the environments for which we
were using Cfront 2.1 (Operating Systems), as it ballooned the code
footprint enormously.
I would like to know if one cpp file generates one c file? (excluding templates)

file1.cpp file1.h -- translated to -> file1.c file1'.h
or
file1.cpp file1.h -- translated to -> file1.c (with declarations inside)
or

file1.cpp file2.cpp ... file1.h -- translated to -> big.c
Scott Lurndal
2017-04-28 18:26:44 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by Scott Lurndal
Post by Thiago Adams
Bart,
I notice the source of your program is in your other language and then
you translate to C.
There is one C++ compiler (comeau), apart of CFront, that generates C code.
I am curious about the output. How they managed the generation etc.., but
I can't find samples, our trial download etc.
I found some details about how it was done in CFront as well.
ftp://public.dhe.ibm.com/software/rational/docs/docset/doc/cpf_4.2/ccase_ux/ccbuild/ccbuild-66.html
The most interesting part is that both compiler have templates.
The code generation for "automatic instantiation" is very interesting.
To be fair, cfront didn't have template support until version 3.0. And
it was not particularly usable, at least in the environments for which we
were using Cfront 2.1 (Operating Systems), as it ballooned the code
footprint enormously.
I would like to know if one cpp file generates one c file? (excluding templates)
file1.cpp file1.h -- translated to -> file1.c file1'.h
or
file1.cpp file1.h -- translated to -> file1.c (with declarations inside)
or
file1.cpp file2.cpp ... file1.h -- translated to -> big.c
For each C++ source file, the pipe was effectively:

cat file1.CC | c++cpp | cfront | cc -c file1.o
Thiago Adams
2017-04-28 19:59:17 UTC
Reply
Permalink
Raw Message
Post by Scott Lurndal
Post by Thiago Adams
Post by Scott Lurndal
Post by Thiago Adams
Bart,
I notice the source of your program is in your other language and then
you translate to C.
There is one C++ compiler (comeau), apart of CFront, that generates C code.
I am curious about the output. How they managed the generation etc.., but
I can't find samples, our trial download etc.
I found some details about how it was done in CFront as well.
ftp://public.dhe.ibm.com/software/rational/docs/docset/doc/cpf_4.2/ccase_ux/ccbuild/ccbuild-66.html
The most interesting part is that both compiler have templates.
The code generation for "automatic instantiation" is very interesting.
To be fair, cfront didn't have template support until version 3.0. And
it was not particularly usable, at least in the environments for which we
were using Cfront 2.1 (Operating Systems), as it ballooned the code
footprint enormously.
I would like to know if one cpp file generates one c file? (excluding templates)
file1.cpp file1.h -- translated to -> file1.c file1'.h
or
file1.cpp file1.h -- translated to -> file1.c (with declarations inside)
or
file1.cpp file2.cpp ... file1.h -- translated to -> big.c
cat file1.CC | c++cpp | cfront | cc -c file1.o
I presume the declaration was injected into the c file.
The class declaration from header file is injected at out.c
in a form of struct.
The other option would be to create a new header file that
translate from class to struct and the out.c would include this
new header.
I think the intention of CFront was not to create readable
or friendly C file.
I don't know how people debug the source using CFront.
Scott Lurndal
2017-04-28 20:29:24 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by Scott Lurndal
cat file1.CC | c++cpp | cfront | cc -c file1.o
The class declaration from header file is injected at out.c
in a form of struct.
A struct was created in the C file for the non-static data members, yes.
Post by Thiago Adams
The other option would be to create a new header file that
translate from class to struct and the out.c would include this
new header.
No, all the preprocessing was done before cfront ever saw the
code (that's the c++cpp step, which was a distinct executable).
Post by Thiago Adams
I think the intention of CFront was not to create readable
or friendly C file.
It was not the intent to create a readable or friendly C file.

In fact, it was very much unreadable. The comma operator
was very heavily used [*].
Post by Thiago Adams
I don't know how people debug the source using CFront.
Cfront included #line directives in the C source that would
generate dwarf data associating the C code fragments with the
with the corresponding C++ source lines. sdb (SVR4) could
source-level debug C++ code compiled with cfront. As we
were using cfront to generate operating system code, we had
a built-in (non-source-level) kernel debugger that worked
well.

[*] I had to update the temporary register spill functions
in the motorola 88100 version of the portable C compiler to
handle some of the huge expression trees generated when the
output of Cfront was compiled with pcc. Had to calculate the
sethi-ullman number and implement the spill code.

https://en.wikipedia.org/wiki/Sethi%E2%80%93Ullman_algorithm
bartc
2017-04-28 16:35:50 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
I can imagine many ways of translate to C, but I would like to
see how others do.
I used to have a lot of problems with it. Partly because of a mismatch
between C and the source language. Now I've dropped some features and it
works better.

However it's still the case that C 'gets in the way' more than I would
like, with its ideas about types or about what constitutes undefined
behaviour.

One example recently talked out is dealing names in the source language
that start with "str" or "mem", or that contain "$". (One early C
generator used $ extensively for temporary values.)
Post by Thiago Adams
I found some details about how it was done in CFront as well.
ftp://public.dhe.ibm.com/software/rational/docs/docset/doc/cpf_4.2/ccase_ux/ccbuild/ccbuild-66.html
The most interesting part is that both compiler have templates.
The code generation for "automatic instantiation" is very interesting.
Are you planning to add more features in your language that translates do C?
No, it's deliberately staying low-level. (I have a higher level
language, but even that has a conservative set of features.)
Post by Thiago Adams
Does your current C compiler do everything in one pass? (and how about the compiler that translate to C?)
The C compiler is 2.5 pass. (The 0.5 pass could probably be absorbed
into the second.)

The translator from my language is 4.5 passes for native code target
(not counting an external assembler), and 4 passes for C target. (The
compiler to process the source code of that translator into dynamic
byte-code is 3 passes.)

I think the C compiler can just about be reduced to 1.5 or 1 pass, but
the quality (such as it is) and flexibility would suffer. However,
improving compile-speed isn't a priority right now (already being too
fast to measure on a PC).
Post by Thiago Adams
Do you have function overload in your other language?
No. As I said, I also use a higher level (but much slower) language,
which has dynamic types, and that lets you do a lot of things that a
faster language would require templates, generics, classes and function
and operator overloading to achieve. And lets you do with much nicer,
uncluttered code too.

However, such languages are slow, otherwise no one would use C or C++.

(The static language does have, built-in as operators, some features
that are functions or macros in C, and so benefit from normal
overloading. For example, 'pow', 'abs', 'min' and 'max' are operators
that can work with ints or floats.)
--
bartc
Thiago Adams
2017-04-28 20:21:18 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Thiago Adams
I can imagine many ways of translate to C, but I would like to
see how others do.
I used to have a lot of problems with it. Partly because of a mismatch
between C and the source language. Now I've dropped some features and it
works better.
However it's still the case that C 'gets in the way' more than I would
like, with its ideas about types or about what constitutes undefined
behaviour.
One example recently talked out is dealing names in the source language
that start with "str" or "mem", or that contain "$". (One early C
generator used $ extensively for temporary values.)
Post by Thiago Adams
I found some details about how it was done in CFront as well.
ftp://public.dhe.ibm.com/software/rational/docs/docset/doc/cpf_4.2/ccase_ux/ccbuild/ccbuild-66.html
The most interesting part is that both compiler have templates.
The code generation for "automatic instantiation" is very interesting.
Are you planning to add more features in your language that translates do C?
No, it's deliberately staying low-level. (I have a higher level
language, but even that has a conservative set of features.)
Post by Thiago Adams
Does your current C compiler do everything in one pass? (and how about the compiler that translate to C?)
The C compiler is 2.5 pass. (The 0.5 pass could probably be absorbed
into the second.)
Do you build an AST for this process?
Post by bartc
The translator from my language is 4.5 passes for native code target
(not counting an external assembler), and 4 passes for C target. (The
compiler to process the source code of that translator into dynamic
byte-code is 3 passes.)
I think the C compiler can just about be reduced to 1.5 or 1 pass, but
the quality (such as it is) and flexibility would suffer. However,
improving compile-speed isn't a priority right now (already being too
fast to measure on a PC).
Post by Thiago Adams
Do you have function overload in your other language?
No. As I said, I also use a higher level (but much slower) language,
which has dynamic types, and that lets you do a lot of things that a
faster language would require templates, generics, classes and function
and operator overloading to achieve. And lets you do with much nicer,
uncluttered code too.
Why did you choose your language to implement this compiler?
How do you think your language (what's the name?) help you
compared with C on this project?
bartc
2017-04-28 21:35:37 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by bartc
The C compiler is 2.5 pass. (The 0.5 pass could probably be absorbed
into the second.)
Do you build an AST for this process?
Yes. Although a one pass compiler probably wouldn't need one. (You can
see the AST by running it as 'mcc -ast program.c', it will write it out
to a file.)
Post by Thiago Adams
Post by bartc
No. As I said, I also use a higher level (but much slower) language,
which has dynamic types, and that lets you do a lot of things that a
faster language would require templates, generics, classes and function
and operator overloading to achieve. And lets you do with much nicer,
uncluttered code too.
Why did you choose your language to implement this compiler?
How do you think your language (what's the name?) help you
compared with C on this project?
I use my language because I'm more comfortable with it, can be much more
productive, and I've been using versions of it for decades, so it's very
familiar. It doesn't have the dozens of annoyances of C (type
declaration syntax, mixing up pointers and arrays, etc etc); I can't be
bothered with all that.

Also, I had been using the interpreted version of it for writing
compilers, which makes it easier than using a static language like C. It
was fast enough (the compiler could compile itself in 2 seconds -
interpreted).

But I've been looking at how fast it /could/ go, and that needed static
code. This C compiler can apparently compile itself - the one-file C
version - in 0.016 seconds:

c:\cx>mcc64 -time mcc64

MCC Compiler 4.28
Compiling mcc64.c to mcc64.asm64
17131 Lines
Load: 0 ms 0K Lines per second
Parse: 0 ms 0K Lines per second
Codegen1: 16 ms
(Load/Parse/Gen1): 16 ms 1070K Lines per second
Codegen2: 0 ms
Writeasm: 0 ms
Compile: 16 ms 1070K Lines per second
Load+Compile: 16 ms 1070K Lines per second
Program: 16 ms 1070K Lines per second

although the figures suggest problems in timer resolution when it's this
quick.
--
bartc
Thiago Adams
2017-05-01 12:16:04 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Thiago Adams
Post by bartc
The C compiler is 2.5 pass. (The 0.5 pass could probably be absorbed
into the second.)
Do you build an AST for this process?
Yes. Although a one pass compiler probably wouldn't need one. (You can
see the AST by running it as 'mcc -ast program.c', it will write it out
to a file.)
Is your preprocessor a separated step?
(In my parser, the preprocessor is inside the scanner. The output
of the scanner is only preprocessed tokens)
bartc
2017-05-01 12:47:36 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by bartc
Post by Thiago Adams
Post by bartc
The C compiler is 2.5 pass. (The 0.5 pass could probably be absorbed
into the second.)
Do you build an AST for this process?
Yes. Although a one pass compiler probably wouldn't need one. (You can
see the AST by running it as 'mcc -ast program.c', it will write it out
to a file.)
Is your preprocessor a separated step?
(In my parser, the preprocessor is inside the scanner. The output
of the scanner is only preprocessed tokens)
Well, it's not a separate pass. Probably the same as yours: the
preprocessing is done as it goes along, which makes it harder.

By preprocessing, I mainly have in mind macro expansion which is where
most of the bugs are going to arise. #include and #if are trivial.

Probably, a separate stage would have been easier (convert entire input
source to tokens then preprocess). But would have been slower, and
unjustified if 99% of the preprocessed input source wasn't macro-expanded.

(I use a three-tiered tokeniser controlled by these three functions:

lexreadtoken() Return next token from actual source text

lexm() Calls lexreadtoken(), deals with macro expansion

lex() Calls lexm(), works a token in hand, allowing
a one-token lookahead. Called from parser.

Without macros with parameters, the middle lexm() function could have
been dispensed with.

For preprocessing only (-E option), only lexm() is needed.)
--
bartc
Thiago Adams
2017-05-03 12:16:59 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Thiago Adams
Post by bartc
Post by Thiago Adams
Post by bartc
The C compiler is 2.5 pass. (The 0.5 pass could probably be absorbed
into the second.)
Do you build an AST for this process?
Yes. Although a one pass compiler probably wouldn't need one. (You can
see the AST by running it as 'mcc -ast program.c', it will write it out
to a file.)
Is your preprocessor a separated step?
(In my parser, the preprocessor is inside the scanner. The output
of the scanner is only preprocessed tokens)
Well, it's not a separate pass. Probably the same as yours: the
preprocessing is done as it goes along, which makes it harder.
By preprocessing, I mainly have in mind macro expansion which is where
most of the bugs are going to arise. #include and #if are trivial.
Probably, a separate stage would have been easier (convert entire input
source to tokens then preprocess). But would have been slower, and
unjustified if 99% of the preprocessed input source wasn't macro-expanded.
lexreadtoken() Return next token from actual source text
lexm() Calls lexreadtoken(), deals with macro expansion
lex() Calls lexm(), works a token in hand, allowing
a one-token lookahead. Called from parser.
Without macros with parameters, the middle lexm() function could have
been dispensed with.
For preprocessing only (-E option), only lexm() is needed.)
I do all preprocessing (#if, #include, macro expansion) inside the scanner.
For #if,#else etc I have a kind of state machine to tell when tokens are ignored or not.

#define A 0

1 + /*comment*/ A

The parser will ask NextToken that returns '1',
then NextToken that returns '+',
then NextToken that returns '0'.

I am planning to collect #define, #undef and comments and put then inside AST nodes.
When the AST node is created it will ask the scanner "give me all the collected comments and preprocessor" - "Clear the collected list".
So, /*comment*/ will be inserted at "primary-expression node 0".
Thiago Adams
2017-05-06 01:46:49 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by bartc
Post by Thiago Adams
Post by bartc
Post by Thiago Adams
Post by bartc
The C compiler is 2.5 pass. (The 0.5 pass could probably be absorbed
into the second.)
Do you build an AST for this process?
Yes. Although a one pass compiler probably wouldn't need one. (You can
see the AST by running it as 'mcc -ast program.c', it will write it out
to a file.)
Is your preprocessor a separated step?
(In my parser, the preprocessor is inside the scanner. The output
of the scanner is only preprocessed tokens)
Well, it's not a separate pass. Probably the same as yours: the
preprocessing is done as it goes along, which makes it harder.
By preprocessing, I mainly have in mind macro expansion which is where
most of the bugs are going to arise. #include and #if are trivial.
Probably, a separate stage would have been easier (convert entire input
source to tokens then preprocess). But would have been slower, and
unjustified if 99% of the preprocessed input source wasn't macro-expanded.
lexreadtoken() Return next token from actual source text
lexm() Calls lexreadtoken(), deals with macro expansion
lex() Calls lexm(), works a token in hand, allowing
a one-token lookahead. Called from parser.
Without macros with parameters, the middle lexm() function could have
been dispensed with.
For preprocessing only (-E option), only lexm() is needed.)
I do all preprocessing (#if, #include, macro expansion) inside the scanner.
For #if,#else etc I have a kind of state machine to tell when tokens are ignored or not.
#define A 0
1 + /*comment*/ A
The parser will ask NextToken that returns '1',
then NextToken that returns '+',
then NextToken that returns '0'.
I am planning to collect #define, #undef and comments and put then inside AST nodes.
When the AST node is created it will ask the scanner "give me all the collected comments and preprocessor" - "Clear the collected list".
So, /*comment*/ will be inserted at "primary-expression node 0".
Using this scanner I managed to rebuild source code from AST with macro.
I can put the macro call instead of the expansion in some places decided by me.
When I get a token, I can ask if that token is at the beginning of some macro expansion. I also can ask if the token is the end of the macro expansion.

#define NULL ((void*)0)
int * p = NULL;

So, when I parse the primary-expression ((void*)0) I can ask if am on the begging of some macro expansion. Token '(' is the begging of of expansion of NULL.
When the primary-expression ends, I ask if the macro expansion ended as well exactly at the end of primary-expression.
if this is true, then I replace all the primary-expression by the macro call, otherwise the macro the expansion is used.
I did this in some places.(some grammar productions)

I don´t know if someone else is interested on this subject of rebuild the source code, or preprocessor as parser detail. I also managed to keep or not #includes.I can generated the amalgamation if desired or keep the includes.External includes are always kept so the source code can be used in other platform without rebuild.

For my personal use, I am not far from have a source code where I can rebuilt it completely as amalgamation or individual files.
I don´t use macros too much.
bartc
2017-05-06 11:00:21 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by Thiago Adams
I do all preprocessing (#if, #include, macro expansion) inside the scanner.
For #if,#else etc I have a kind of state machine to tell when tokens are ignored or not.
#define A 0
1 + /*comment*/ A
The parser will ask NextToken that returns '1',
then NextToken that returns '+',
then NextToken that returns '0'.
I am planning to collect #define, #undef and comments and put then inside AST nodes.
When the AST node is created it will ask the scanner "give me all the collected comments and preprocessor" - "Clear the collected list".
So, /*comment*/ will be inserted at "primary-expression node 0".
Using this scanner I managed to rebuild source code from AST with macro.
I can put the macro call instead of the expansion in some places decided by me.
When I get a token, I can ask if that token is at the beginning of some macro expansion. I also can ask if the token is the end of the macro expansion.
#define NULL ((void*)0)
int * p = NULL;
So, when I parse the primary-expression ((void*)0) I can ask if am on the begging of some macro expansion. Token '(' is the begging of of expansion of NULL.
When the primary-expression ends, I ask if the macro expansion ended as well exactly at the end of primary-expression.
if this is true, then I replace all the primary-expression by the macro call, otherwise the macro the expansion is used.
I did this in some places.(some grammar productions)
I don´t know if someone else is interested on this subject of rebuild the source code, or preprocessor as parser detail. I also managed to keep or not #includes.I can generated the amalgamation if desired or keep the includes.External includes are always kept so the source code can be used in other platform without rebuild.
Well, it's interesting that some of this stuff is possible to do. And it
is intriguing how it might work.

So it sounds like macro-calls need to be well-formed, but how about
#defines; start with a+a+a, then do this:

a +
#define a x
a + a;

The current PP rules say that this now becomes a+x+x, but a typical AST
for such an expression would be (using numeric suffixes to make it clearer):

(add2 (add1 a1 a2) a3)

where does the #define go? In the source, it's just after add1 so would
be here:

(add2 (add1 (#define 'a' 'x') a1 a2) a3)

but it's influence would apply to a2 and a3, not a1 and a2. The scope of
#defines is out of kilter with that block-scope and expression precedence.

Or do #defines, the ones to be part of the AST, also need to be properly
placed and follow similar rules to expressions and statements?
--
Bartc
David Kleinecke
2017-05-06 19:01:34 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Thiago Adams
Post by Thiago Adams
I do all preprocessing (#if, #include, macro expansion) inside the scanner.
For #if,#else etc I have a kind of state machine to tell when tokens are ignored or not.
#define A 0
1 + /*comment*/ A
The parser will ask NextToken that returns '1',
then NextToken that returns '+',
then NextToken that returns '0'.
I am planning to collect #define, #undef and comments and put then inside AST nodes.
When the AST node is created it will ask the scanner "give me all the collected comments and preprocessor" - "Clear the collected list".
So, /*comment*/ will be inserted at "primary-expression node 0".
Using this scanner I managed to rebuild source code from AST with macro.
I can put the macro call instead of the expansion in some places decided by me.
When I get a token, I can ask if that token is at the beginning of some macro expansion. I also can ask if the token is the end of the macro expansion.
#define NULL ((void*)0)
int * p = NULL;
So, when I parse the primary-expression ((void*)0) I can ask if am on the begging of some macro expansion. Token '(' is the begging of of expansion of NULL.
When the primary-expression ends, I ask if the macro expansion ended as well exactly at the end of primary-expression.
if this is true, then I replace all the primary-expression by the macro call, otherwise the macro the expansion is used.
I did this in some places.(some grammar productions)
I don´t know if someone else is interested on this subject of rebuild the source code, or preprocessor as parser detail. I also managed to keep or not #includes.I can generated the amalgamation if desired or keep the includes.External includes are always kept so the source code can be used in other platform without rebuild.
Well, it's interesting that some of this stuff is possible to do. And it
is intriguing how it might work.
So it sounds like macro-calls need to be well-formed, but how about
a +
#define a x
a + a;
The current PP rules say that this now becomes a+x+x, but a typical AST
(add2 (add1 a1 a2) a3)
where does the #define go? In the source, it's just after add1 so would
(add2 (add1 (#define 'a' 'x') a1 a2) a3)
but it's influence would apply to a2 and a3, not a1 and a2. The scope of
#defines is out of kilter with that block-scope and expression precedence.
Or do #defines, the ones to be part of the AST, also need to be properly
placed and follow similar rules to expressions and statements?
--
Bartc
The preprocessor and the parser are different modules. The
parser knows nothing about the #define, The preprocessor
knows nothing about the meaning of "+".

The processor can be (= usually is?) implemented on a token
by token basis incrementally. It generates a new token whenever
the parser asks for a new token. After the pre-processor has
sent "a" and "+" the next call for a token results in processing
the #define, then reading in an "a" recognizing it as a macro
name, expanding the macro and finally returning the first token
in the macro expansion. After passing all the expansion tokens
(in this case just a single "x") the pre-processor reads in
another token ("+") and then finally another macro expansion.

But then you already knew all that.
Thiago Adams
2017-05-06 19:15:39 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by bartc
Post by Thiago Adams
Post by Thiago Adams
I do all preprocessing (#if, #include, macro expansion) inside the scanner.
For #if,#else etc I have a kind of state machine to tell when tokens are ignored or not.
#define A 0
1 + /*comment*/ A
The parser will ask NextToken that returns '1',
then NextToken that returns '+',
then NextToken that returns '0'.
I am planning to collect #define, #undef and comments and put then inside AST nodes.
When the AST node is created it will ask the scanner "give me all the collected comments and preprocessor" - "Clear the collected list".
So, /*comment*/ will be inserted at "primary-expression node 0".
Using this scanner I managed to rebuild source code from AST with macro.
I can put the macro call instead of the expansion in some places decided by me.
When I get a token, I can ask if that token is at the beginning of some macro expansion. I also can ask if the token is the end of the macro expansion.
#define NULL ((void*)0)
int * p = NULL;
So, when I parse the primary-expression ((void*)0) I can ask if am on the begging of some macro expansion. Token '(' is the begging of of expansion of NULL.
When the primary-expression ends, I ask if the macro expansion ended as well exactly at the end of primary-expression.
if this is true, then I replace all the primary-expression by the macro call, otherwise the macro the expansion is used.
I did this in some places.(some grammar productions)
I don´t know if someone else is interested on this subject of rebuild the source code, or preprocessor as parser detail. I also managed to keep or not #includes.I can generated the amalgamation if desired or keep the includes.External includes are always kept so the source code can be used in other platform without rebuild.
Well, it's interesting that some of this stuff is possible to do. And it
is intriguing how it might work.
So it sounds like macro-calls need to be well-formed, but how about
a +
#define a x
a + a;
The current PP rules say that this now becomes a+x+x, but a typical AST
(add2 (add1 a1 a2) a3)
where does the #define go? In the source, it's just after add1 so would
(add2 (add1 (#define 'a' 'x') a1 a2) a3)
but it's influence would apply to a2 and a3, not a1 and a2. The scope of
#defines is out of kilter with that block-scope and expression precedence.
Or do #defines, the ones to be part of the AST, also need to be properly
placed and follow similar rules to expressions and statements?
--
Bartc
The preprocessor and the parser are different modules. The
parser knows nothing about the #define, The preprocessor
knows nothing about the meaning of "+".
In other topics, I raise the question of integration of the parser and preprocessor in a way that nobody would noticed (if desired) or they would notice for good reasons.
Post by David Kleinecke
The processor can be (= usually is?) implemented on a token
by token basis incrementally. It generates a new token whenever
the parser asks for a new token. After the pre-processor has
sent "a" and "+" the next call for a token results in processing
the #define, then reading in an "a" recognizing it as a macro
name, expanding the macro and finally returning the first token
in the macro expansion. After passing all the expansion tokens
(in this case just a single "x") the pre-processor reads in
another token ("+") and then finally another macro expansion.
But then you already knew all that.
I do all processing inside the scanner.
The parser just ask for the next token.

a +
#define a x
a + a;

Next -> 'a'
Next -> '+'
Next -> 'x'
Next -> '+'
Next -> 'x'
Next -> ';'
Next -> EOF

But in the middle I can ask for the scanner for collected #define.

a +
#define a x
a + a;





Next -> 'a'
Next -> '+'

Scanner did you collect anything?
If Yes put at begging-list of the next node.

Scanner are you at the begging of macro expansion?
Next -> 'x'
Scanner are you at the end of macro expansion? If yes, and I am on
the primary-expression then I will give the node a 'expanded call'
that can be used latter to generate the macro call instead of
expansion.

Next -> '+'

//same
Next -> 'x'
//same

Next -> ';'
Next -> EOF
David Kleinecke
2017-05-06 21:03:16 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by David Kleinecke
Post by bartc
Post by Thiago Adams
Post by Thiago Adams
I do all preprocessing (#if, #include, macro expansion) inside the scanner.
For #if,#else etc I have a kind of state machine to tell when tokens are ignored or not.
#define A 0
1 + /*comment*/ A
The parser will ask NextToken that returns '1',
then NextToken that returns '+',
then NextToken that returns '0'.
I am planning to collect #define, #undef and comments and put then inside AST nodes.
When the AST node is created it will ask the scanner "give me all the collected comments and preprocessor" - "Clear the collected list".
So, /*comment*/ will be inserted at "primary-expression node 0".
Using this scanner I managed to rebuild source code from AST with macro.
I can put the macro call instead of the expansion in some places decided by me.
When I get a token, I can ask if that token is at the beginning of some macro expansion. I also can ask if the token is the end of the macro expansion.
#define NULL ((void*)0)
int * p = NULL;
So, when I parse the primary-expression ((void*)0) I can ask if am on the begging of some macro expansion. Token '(' is the begging of of expansion of NULL.
When the primary-expression ends, I ask if the macro expansion ended as well exactly at the end of primary-expression.
if this is true, then I replace all the primary-expression by the macro call, otherwise the macro the expansion is used.
I did this in some places.(some grammar productions)
I don´t know if someone else is interested on this subject of rebuild the source code, or preprocessor as parser detail. I also managed to keep or not #includes.I can generated the amalgamation if desired or keep the includes.External includes are always kept so the source code can be used in other platform without rebuild.
Well, it's interesting that some of this stuff is possible to do. And it
is intriguing how it might work.
So it sounds like macro-calls need to be well-formed, but how about
a +
#define a x
a + a;
The current PP rules say that this now becomes a+x+x, but a typical AST
(add2 (add1 a1 a2) a3)
where does the #define go? In the source, it's just after add1 so would
(add2 (add1 (#define 'a' 'x') a1 a2) a3)
but it's influence would apply to a2 and a3, not a1 and a2. The scope of
#defines is out of kilter with that block-scope and expression precedence.
Or do #defines, the ones to be part of the AST, also need to be properly
placed and follow similar rules to expressions and statements?
--
Bartc
The preprocessor and the parser are different modules. The
parser knows nothing about the #define, The preprocessor
knows nothing about the meaning of "+".
In other topics, I raise the question of integration of the parser and preprocessor in a way that nobody would noticed (if desired) or they would notice for good reasons.
Post by David Kleinecke
The processor can be (= usually is?) implemented on a token
by token basis incrementally. It generates a new token whenever
the parser asks for a new token. After the pre-processor has
sent "a" and "+" the next call for a token results in processing
the #define, then reading in an "a" recognizing it as a macro
name, expanding the macro and finally returning the first token
in the macro expansion. After passing all the expansion tokens
(in this case just a single "x") the pre-processor reads in
another token ("+") and then finally another macro expansion.
But then you already knew all that.
I do all processing inside the scanner.
The parser just ask for the next token.
a +
#define a x
a + a;
Next -> 'a'
Next -> '+'
Next -> 'x'
Next -> '+'
Next -> 'x'
Next -> ';'
Next -> EOF
But in the middle I can ask for the scanner for collected #define.
a +
#define a x
a + a;
Next -> 'a'
Next -> '+'
Scanner did you collect anything?
If Yes put at begging-list of the next node.
Scanner are you at the begging of macro expansion?
Next -> 'x'
Scanner are you at the end of macro expansion? If yes, and I am on
the primary-expression then I will give the node a 'expanded call'
that can be used latter to generate the macro call instead of
expansion.
Next -> '+'
//same
Next -> 'x'
//same
Next -> ';'
Next -> EOF
Off hand I would say you just said the same thing I did but at
greater length.

As nearly as I can tell what you call the "scanner" is the same
thing I call the "tokenizer".

I do not think of what we are talking about as "integration". I
look at it as (implicit) piping. In my computer architecture the
preprocessor is implemented as seven programs - the first seven
translation phases (only the first half of the seventh) defined
by the standard. The first phase read successive characters from
an implementation-defined source. The second and third phases
read characters from (meaning by a call to) the earlier phase.
Phases four through seven read preprocessing tokens from the
earlier phase. The parser (which I identify with the first half of
the last sentence in the seventh phase) reads parser tokens from
the seventh phase. Thus a pipeline of nine successive programs plus
whatever happens in the backend to complete translating parser
output to actual machine code.
Thiago Adams
2017-05-08 12:53:02 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Thiago Adams
Post by David Kleinecke
Post by bartc
Post by Thiago Adams
Post by Thiago Adams
I do all preprocessing (#if, #include, macro expansion) inside the scanner.
For #if,#else etc I have a kind of state machine to tell when tokens are ignored or not.
#define A 0
1 + /*comment*/ A
The parser will ask NextToken that returns '1',
then NextToken that returns '+',
then NextToken that returns '0'.
I am planning to collect #define, #undef and comments and put then inside AST nodes.
When the AST node is created it will ask the scanner "give me all the collected comments and preprocessor" - "Clear the collected list".
So, /*comment*/ will be inserted at "primary-expression node 0".
Using this scanner I managed to rebuild source code from AST with macro.
I can put the macro call instead of the expansion in some places decided by me.
When I get a token, I can ask if that token is at the beginning of some macro expansion. I also can ask if the token is the end of the macro expansion.
#define NULL ((void*)0)
int * p = NULL;
So, when I parse the primary-expression ((void*)0) I can ask if am on the begging of some macro expansion. Token '(' is the begging of of expansion of NULL.
When the primary-expression ends, I ask if the macro expansion ended as well exactly at the end of primary-expression.
if this is true, then I replace all the primary-expression by the macro call, otherwise the macro the expansion is used.
I did this in some places.(some grammar productions)
I don´t know if someone else is interested on this subject of rebuild the source code, or preprocessor as parser detail. I also managed to keep or not #includes.I can generated the amalgamation if desired or keep the includes.External includes are always kept so the source code can be used in other platform without rebuild.
Well, it's interesting that some of this stuff is possible to do. And it
is intriguing how it might work.
So it sounds like macro-calls need to be well-formed, but how about
a +
#define a x
a + a;
The current PP rules say that this now becomes a+x+x, but a typical AST
(add2 (add1 a1 a2) a3)
where does the #define go? In the source, it's just after add1 so would
(add2 (add1 (#define 'a' 'x') a1 a2) a3)
but it's influence would apply to a2 and a3, not a1 and a2. The scope of
#defines is out of kilter with that block-scope and expression precedence.
Or do #defines, the ones to be part of the AST, also need to be properly
placed and follow similar rules to expressions and statements?
--
Bartc
The preprocessor and the parser are different modules. The
parser knows nothing about the #define, The preprocessor
knows nothing about the meaning of "+".
In other topics, I raise the question of integration of the parser and preprocessor in a way that nobody would noticed (if desired) or they would notice for good reasons.
Post by David Kleinecke
The processor can be (= usually is?) implemented on a token
by token basis incrementally. It generates a new token whenever
the parser asks for a new token. After the pre-processor has
sent "a" and "+" the next call for a token results in processing
the #define, then reading in an "a" recognizing it as a macro
name, expanding the macro and finally returning the first token
in the macro expansion. After passing all the expansion tokens
(in this case just a single "x") the pre-processor reads in
another token ("+") and then finally another macro expansion.
But then you already knew all that.
I do all processing inside the scanner.
The parser just ask for the next token.
a +
#define a x
a + a;
Next -> 'a'
Next -> '+'
Next -> 'x'
Next -> '+'
Next -> 'x'
Next -> ';'
Next -> EOF
But in the middle I can ask for the scanner for collected #define.
a +
#define a x
a + a;
Next -> 'a'
Next -> '+'
Scanner did you collect anything?
If Yes put at begging-list of the next node.
Scanner are you at the begging of macro expansion?
Next -> 'x'
Scanner are you at the end of macro expansion? If yes, and I am on
the primary-expression then I will give the node a 'expanded call'
that can be used latter to generate the macro call instead of
expansion.
Next -> '+'
//same
Next -> 'x'
//same
Next -> ';'
Next -> EOF
Off hand I would say you just said the same thing I did but at
greater length.
As nearly as I can tell what you call the "scanner" is the same
thing I call the "tokenizer".
I do not think of what we are talking about as "integration". I
look at it as (implicit) piping.
If the parser has the information of preprocessor and vice versa we can
do and check a lot of things. This is the integration.

For instance
#if sizeof(int)
preprocessor can use sizeof

#if symbol_defined(F)

preprocessor can ask if function F was defined.


int A = 1;

Parser could tell that A is a macro and it will be replaces by "B".
(Can be a error or not of course)

One more:

void F1(){}
#define F1 F2
void F2(){}

int main()
{
F1();
return 0;
}

Preprocessor could check if F1 was defined and emit warning when redefining F1.
"Warning: You are redefining an existing symbol F1"
bartc
2017-05-08 13:26:23 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
If the parser has the information of preprocessor and vice versa we can
do and check a lot of things. This is the integration.
For instance
#if sizeof(int)
preprocessor can use sizeof
#if symbol_defined(F)
preprocessor can ask if function F was defined.
I can see some use for this. But it can be done now by defining a
corresponding macro at the same time as F.
Post by Thiago Adams
int A = 1;
Parser could tell that A is a macro and it will be replaces by "B".
(Can be a error or not of course)
void F1(){}
#define F1 F2
void F2(){}
int main()
{
F1();
return 0;
}
Preprocessor could check if F1 was defined and emit warning when redefining F1.
"Warning: You are redefining an existing symbol F1"
This sounds less useful. Defining F1 twice is already an error, and will
be picked up by other passes. Why would the preprocessor want to get
involved?

Also, at what point is F1 'defined'? For example, at which of these
points marked with '#':

void F1
#
()
#
{
#
}
#

symbol_defined(F1) will return true at the last #, but how about the
others? I would say false for the first two, and true for the third. But
then, some compilers may insert implementation-specific stuff between )
and {, and you want symbol_defined to return true for the preprocessor
to be able to influence it.

And, 'symbol_defined' is too generic. F1 might be a filescope variable
name, it might be a parameter name, or enum name, or member name, or
local variable (some now out of scope, but they might still count as
having been defined).

What about when F1 is a struct or enum tag? This comes back to the
processor having intimate access to the symbol table and the current
state of the parser.
--
bartc
Thiago Adams
2017-05-08 14:10:13 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Thiago Adams
If the parser has the information of preprocessor and vice versa we can
do and check a lot of things. This is the integration.
For instance
#if sizeof(int)
preprocessor can use sizeof
#if symbol_defined(F)
preprocessor can ask if function F was defined.
I can see some use for this. But it can be done now by defining a
corresponding macro at the same time as F.
Post by Thiago Adams
int A = 1;
Parser could tell that A is a macro and it will be replaces by "B".
(Can be a error or not of course)
void F1(){}
#define F1 F2
void F2(){}
int main()
{
F1();
return 0;
}
Preprocessor could check if F1 was defined and emit warning when redefining F1.
"Warning: You are redefining an existing symbol F1"
This sounds less useful. Defining F1 twice is already an error, and will
be picked up by other passes. Why would the preprocessor want to get
involved?
F1 was defined once. This sample compiles.
Post by bartc
Also, at what point is F1 'defined'? For example, at which of these
void F1
#
()
#
{
#
}
#
symbol_defined(F1) will return true at the last #, but how about the
others? I would say false for the first two, and true for the third. But
then, some compilers may insert implementation-specific stuff between )
and {, and you want symbol_defined to return true for the preprocessor
to be able to influence it.
And, 'symbol_defined' is too generic. F1 might be a filescope variable
name, it might be a parameter name, or enum name, or member name, or
local variable (some now out of scope, but they might still count as
having been defined).
What about when F1 is a struct or enum tag? This comes back to the
processor having intimate access to the symbol table and the current
state of the parser.
I am not missing this 'symbol_defined' at this moment.
I have used this just once in VC++ (__if_exists)
I used to activate a switch case if some function was defined.
if OnPaint function was defined, then I activate WM_PAINT case on
switch that had everything waiting.

I used it as sample what could be done. sizeof as well.
If your point is that is hard do define the rules, I agree with you.
But at the same time, we have well defined intention many times.

This other sample also compile and prints 3.

#include <stdio.h>

const int Pi = 1;
#define PI 3.14

int main()
{
printf("%d", (int) PI);
return 0;
}

I would like to see a warning:
"Warning you are hiding the variable int PI"

The well defined intention is to create a constant.

Fortunately I didn't have this problem.
I had two problems with macros that I can remember.
min max macro defined in windows headers conflict with std::min.
and also window.h define some macros to replace function names for
versions with A or W. Both gave me compiler/linker error that is good.
bartc
2017-05-08 14:36:14 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by bartc
This sounds less useful. Defining F1 twice is already an error, and will
be picked up by other passes. Why would the preprocessor want to get
involved?
F1 was defined once. This sample compiles.
Oh, OK, I thought it was the other way around.
Post by Thiago Adams
Post by bartc
Also, at what point is F1 'defined'? For example, at which of these
void F1
#
()
#
{
#
}
#
symbol_defined(F1) will return true at the last #, but how about the
others? I would say false for the first two, and true for the third. But
then, some compilers may insert implementation-specific stuff between )
and {, and you want symbol_defined to return true for the preprocessor
to be able to influence it.
And, 'symbol_defined' is too generic. F1 might be a filescope variable
name, it might be a parameter name, or enum name, or member name, or
local variable (some now out of scope, but they might still count as
having been defined).
What about when F1 is a struct or enum tag? This comes back to the
processor having intimate access to the symbol table and the current
state of the parser.
I am not missing this 'symbol_defined' at this moment.
I have used this just once in VC++ (__if_exists)
I used to activate a switch case if some function was defined.
I looked up __if_exists. It doesn't look like a preprocessor directive,
and is presented as though it was normal syntax. Perhaps it is.

I don't have much objection to custom statements of this kind. They fit
it with the parser and with normal syntax, they follow the same rules.

(In fact I have one or two such statements of my own, for example:

showtype expr;

which displays th type of the given expression without having wade
through 250,000 lines of diagnostics, or it might fail shortly after
anyway before the diags are generated.)

Statements such as __if_exists(name){} would be very easy to implement.
In contrast to trying to do this at preprocessor level. Unless what can
be written inside {...} can be unstructured elements (such as a just
'{a[i] = }', with the rest coming outside the {}), but I didn't get that
impression from the msdn page.
Post by Thiago Adams
if OnPaint function was defined, then I activate WM_PAINT case on
switch that had everything waiting.
I used it as sample what could be done. sizeof as well.
If your point is that is hard do define the rules, I agree with you.
But at the same time, we have well defined intention many times.
Well, the difference between us is that I'd rather not have the
preprocessor integrated. (Actually, I would rather it disappeared
completely so that I'd be left with something I can understand, be in
control off, and have more confidence that it will work.)

So I'm looking at any excuse to keep it at arm's length!
Post by Thiago Adams
This other sample also compile and prints 3.
#include <stdio.h>
const int Pi = 1;
#define PI 3.14
int main()
{
printf("%d", (int) PI);
return 0;
}
"Warning you are hiding the variable int PI"
Pi and PI use different sets of upper and lower case, so are different
identifiers. If you enabled such a warning, most programs would give
warnings about nothing else. It seems a common C idiom to use case on
the same identifier to allow one as a type, the other as an idiom.
--
bartc
bartc
2017-05-08 14:43:24 UTC
Reply
Permalink
Raw Message
Post by bartc
Pi and PI use different sets of upper and lower case, so are different
identifiers. If you enabled such a warning, most programs would give
warnings about nothing else. It seems a common C idiom to use case on
the same identifier to allow one as a type, the other as an idiom.
Not an idiom but a variable, or anything else that isn't a type.
Basically it means C programmers show a lack of imagination when they
apply a different case instead of thinking up a new name.

One interesting (and easy) way of checking such uses would be a
preprocessor that converted all identifiers to one case (pretend it was
case-insensitive), and see what comes up. I suspect a lot of sources
would fail such a test as there would be clashes.
--
bartc
David Kleinecke
2017-05-08 19:06:14 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by bartc
Pi and PI use different sets of upper and lower case, so are different
identifiers. If you enabled such a warning, most programs would give
warnings about nothing else. It seems a common C idiom to use case on
the same identifier to allow one as a type, the other as an idiom.
Not an idiom but a variable, or anything else that isn't a type.
Basically it means C programmers show a lack of imagination when they
apply a different case instead of thinking up a new name.
One interesting (and easy) way of checking such uses would be a
preprocessor that converted all identifiers to one case (pretend it was
case-insensitive), and see what comes up. I suspect a lot of sources
would fail such a test as there would be clashes.
Easy enough if you want to. When the tokenizer recognizes
an identifier it must check whether the identifier has
already tokenized. Just change all the identifiers to
lower case before you test for already tokenized. But why
bother?
Thiago Adams
2017-05-08 18:02:52 UTC
Reply
Permalink
Raw Message
On Monday, May 8, 2017 at 11:36:17 AM UTC-3, Bart wrote:
[...]
Post by bartc
I looked up __if_exists. It doesn't look like a preprocessor directive,
and is presented as though it was normal syntax. Perhaps it is.
It's not preprocessor.
Now C++ has officially if constexpr.
if constexpr (DEBUG)
{
log();
}
constexpr also can be used at other places where preprocessor was required.
But it still required. After many decades the preprocessor still used.

I don't like two ways of doing the same thing. C++ is trying to kill preprocessor but it is not there yet.

[...]
Post by bartc
Post by Thiago Adams
I used it as sample what could be done. sizeof as well.
If your point is that is hard do define the rules, I agree with you.
But at the same time, we have well defined intention many times.
Well, the difference between us is that I'd rather not have the
preprocessor integrated. (Actually, I would rather it disappeared
completely so that I'd be left with something I can understand, be in
control off, and have more confidence that it will work.)
C code can be safe and correct, despite of many potential pitfalls.
People that program in C learn how to manage these problems and they
follow some patterns.
I think the patterns for the preprocessor usage are there.
Maybe a tool can do statistics on github and report the patterns.

So, I think that some of these patterns can be integrated on compilers
having the implementation that understand preprocessor and compilation together.
I guess, that one of these correct patterns is the creation of constants.
Probably most of us are doing this.
I would like to see statics of the position of #define, #undef , #if in source code as well.
bartc
2017-05-08 18:46:32 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
I don't like two ways of doing the same thing. C++ is trying to kill preprocessor but it is not there yet.
One good point about C++ then! (Mind you it probably wants to replace it
with something worse, advanced templates perhaps.)

An 'if (special) stmt' feature is straightforward however. That
'special' expression is just some compiler-specific inquiry, and the
whole thing is treated like an ordinary 'if'.

Almost. I expect that it would allow:

if (special)
int x;
else
double x;

(both x's in the same scope), which is not possible with an ordinary if.
So not /quite/ so straightforward.
Post by Thiago Adams
Post by bartc
Well, the difference between us is that I'd rather not have the
preprocessor integrated. (Actually, I would rather it disappeared
completely so that I'd be left with something I can understand, be in
control off, and have more confidence that it will work.)
C code can be safe and correct, despite of many potential pitfalls.
People that program in C learn how to manage these problems and they
follow some patterns.
I think the patterns for the preprocessor usage are there.
Maybe a tool can do statistics on github and report the patterns.
So, I think that some of these patterns can be integrated on compilers
having the implementation that understand preprocessor and compilation together.
I guess, that one of these correct patterns is the creation of constants.
Probably most of us are doing this.
I would like to see statics of the position of #define, #undef , #if in source code as well.
The preprocessor was added decades ago. It was a crude way to achieve
certain things that ought to be done with a little more sophistication now:

#includes => imports

#define C expr => constant T C = expr; // with proper scopes

#define max(...) => Just add a proper max operator!

#define LEN(a) sizeof((a))/sizeof((a)[0])
=> And a proper array length attribute

#define FOR(..) => /And/ a proper for statement

Some types of macro are defined over and over again by nearly everyone,
which gives you a hint that perhaps they ought to be in the language. Or
at the least, be a standard macro.

#if => No real alternative, but should be used sparingly
--
bartc
Thiago Adams
2017-05-08 19:23:25 UTC
Reply
Permalink
Raw Message
On Monday, May 8, 2017 at 11:36:17 AM UTC-3, Bart wrote:
[...]
Post by bartc
showtype expr;
which displays th type of the given expression without having wade
through 250,000 lines of diagnostics, or it might fail shortly after
anyway before the diags are generated.)
Can you implement gcc typeof using this?
I want to implement this as well for static analysis.
bartc
2017-05-08 20:08:16 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
[...]
Post by bartc
showtype expr;
which displays th type of the given expression without having wade
through 250,000 lines of diagnostics, or it might fail shortly after
anyway before the diags are generated.)
Can you implement gcc typeof using this?
I want to implement this as well for static analysis.
OK, a challenge.

I implemented a 'typeof' that appeared to work, in something under ten
minutes.

But it probably needs more testing. I suspect it won't work inside a
parameter list to get the type of an earlier parameter (because they are
not properly processed until later).

And if F is a function, then typeof(F) gives a function pointer type,
not the return type of F. But that can be fixed. (I don't know how gcc's
typeof works or what extra things it does.)

A basic typeof() doesn't look difficult. Here are the extra lines needed
(as C):

case ktypeofsym:
lex();
skipsymbol(lbracksym);
p = readexpression();
skipsymbol(rbracksym);
if (d.typeno || mod) {
serror("typeof");
}
d.typeno = p->mode;
break;

and there will be a handful lines of elsewhere defining 'typeof' or
detecting the symbol as now being a type-starter.
--
bartc
s***@casperkitty.com
2017-05-08 20:26:41 UTC
Reply
Permalink
Raw Message
Post by bartc
And if F is a function, then typeof(F) gives a function pointer type,
not the return type of F. But that can be fixed. (I don't know how gcc's
typeof works or what extra things it does.)
I would think that typeof(functionName) should yield a pointer-to-function
type, and typeof(functionName()) or typeof(pointerToFunction()) should yield
the return type if there is no support for function overloading, or if all
extant overloads have the same return type.
Keith Thompson
2017-05-08 20:32:55 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by bartc
And if F is a function, then typeof(F) gives a function pointer type,
not the return type of F. But that can be fixed. (I don't know how gcc's
typeof works or what extra things it does.)
I would think that typeof(functionName) should yield a pointer-to-function
type, and typeof(functionName()) or typeof(pointerToFunction()) should yield
the return type if there is no support for function overloading, or if all
extant overloads have the same return type.
I would expect that typeof(functionName) would yield a function type,
i.e., that the operand of typeof would be another context in which a
function designator is not "converted" to pointer type. Likewise for
typeof(arrayName).
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-05-08 20:40:11 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
I would expect that typeof(functionName) would yield a function type,
i.e., that the operand of typeof would be another context in which a
function designator is not "converted" to pointer type. Likewise for
typeof(arrayName).
Since arrays are objects, deferring array-to-pointer conversion will yield
an object type which is different from the pointer type. Since functions
are not objects, the only distinction I can see between a function type and
a function pointer type, outside of function declarations or definitions,
would be that there are some contexts in which a function-pointer type would
be legal and a function type should not. Unless a language is intending to
allow:

int function1(int);
typeof(function1) function2;

as an alternative way of writing:

int function1(int);
int function2(int);

which would seem a little weird, I can't think of any cases where having
typeof() yield a function type would have a different defined meaning from
having it yield a function-pointer type.
bartc
2017-05-08 20:33:36 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by bartc
And if F is a function, then typeof(F) gives a function pointer type,
not the return type of F. But that can be fixed. (I don't know how gcc's
typeof works or what extra things it does.)
I would think that typeof(functionName) should yield a pointer-to-function
type, and typeof(functionName()) or typeof(pointerToFunction()) should yield
the return type if there is no support for function overloading, or if all
extant overloads have the same return type.
I didn't think of trying typeof(functionname())! That seems to work,
although it might be tricky having to arrange a dummy set of arguments.
Leaving them out generates a "too few arguments" error.

But I've just tried the same code with gcc and it seems to work exactly
the same way.
--
bartc
Thiago Adams
2017-05-08 20:30:28 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Thiago Adams
[...]
Post by bartc
showtype expr;
which displays th type of the given expression without having wade
through 250,000 lines of diagnostics, or it might fail shortly after
anyway before the diags are generated.)
Can you implement gcc typeof using this?
I want to implement this as well for static analysis.
OK, a challenge.
I implemented a 'typeof' that appeared to work, in something under ten
minutes.
But it probably needs more testing. I suspect it won't work inside a
parameter list to get the type of an earlier parameter (because they are
not properly processed until later).
And if F is a function, then typeof(F) gives a function pointer type,
not the return type of F. But that can be fixed. (I don't know how gcc's
typeof works or what extra things it does.)
A basic typeof() doesn't look difficult. Here are the extra lines needed
lex();
skipsymbol(lbracksym);
p = readexpression();
skipsymbol(rbracksym);
if (d.typeno || mod) {
serror("typeof");
}
d.typeno = p->mode;
break;
and there will be a handful lines of elsewhere defining 'typeof' or
detecting the symbol as now being a type-starter.
(
I can't understand your code. :D
But it is interesting to see how the style can be so different from mine.
I guess, mostly because I don't use global variables.
For instance:

p = readexpression();

I ask myself "read from where"?
In my code in see in each function what is in and out.

)

The function return type and expression type is useful in
static analysis to show compile errors in general. Maybe this
is not your priority because you compiler is so young.
Did you show some message for invalid types?

A step further, for static analysis is to add more
hidden modifiers.

for instance

int * Get()
{
int *p = malloc(sizeof(int));
return p;
}
Let's say malloc returns a hidden modifier like const but this modifier is "maybe null".

Then the int *p is "contaminated" with this modifier and the result of Get is contaminated as well. The implementation of Get then contaminates the declaration of Get.

Then at some distant point of your code when you do:

int *p = Get();
*p =1;
you can receive a warning.
"p can be null".

Everything could be checked with just one annotation in malloc.
bartc
2017-05-08 21:00:56 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by bartc
A basic typeof() doesn't look difficult. Here are the extra lines needed
lex();
skipsymbol(lbracksym);
p = readexpression();
skipsymbol(rbracksym);
if (d.typeno || mod) {
serror("typeof");
}
d.typeno = p->mode;
break;
and there will be a handful lines of elsewhere defining 'typeof' or
detecting the symbol as now being a type-starter.
(
I can't understand your code. :D
But it is interesting to see how the style can be so different from mine.
I guess, mostly because I don't use global variables.
p = readexpression();
I ask myself "read from where"?
In my code in see in each function what is in and out.
Yes, I use global state which knows where it is in the input source.

If there could be half a dozen possible input sources at the same time,
then it would be necessary to specify which one, hence readexpression()
might take an argument.

Note that C's printf does the same sort of thing with 'stdout'. Same
with some input functions, but I'd have to go and look them up
(getchar() is one I think).
Post by Thiago Adams
The function return type and expression type is useful in
static analysis to show compile errors in general. Maybe this
is not your priority because you compiler is so young.
I'm interested in a rather different set of errors than a more typical C
compiler. But my hands are largely tied by the language.
Post by Thiago Adams
Did you show some message for invalid types?
Whatever type is yielded by typeof(), is treated the same as any other
type, or any type as it is specified with a typedef. Like typedef, you
can't apply 'unsigned' for example to a typeof; it specifies a complete
base type. (Pointers and arrays are done separately as usual.)
Post by Thiago Adams
A step further, for static analysis is to add more
hidden modifiers.
for instance
int * Get()
{
int *p = malloc(sizeof(int));
return p;
}
Let's say malloc returns a hidden modifier like const but this modifier is "maybe null".
Then the int *p is "contaminated" with this modifier and the result of Get is contaminated as well. The implementation of Get then contaminates the declaration of Get.
Well, with 'const' I was tempted to eliminate it completely (it would be
just a no-op). That would get rid of the contamination. It's introducing
subtle type-handling bugs.
--
Bartc
bartc
2017-05-08 21:18:30 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
[...]
Post by bartc
showtype expr;
which displays th type of the given expression without having wade
through 250,000 lines of diagnostics, or it might fail shortly after
anyway before the diags are generated.)
Can you implement gcc typeof using this?
I want to implement this as well for static analysis.
Since I did 'typeof', I also did 'strtype'. This turns any type into a
string. Example:

#include <stdio.h>

float (*a)[5][10];

int main(void) {
printf("%s\n", strtype(typeof(a)));
}

Output is not C format however (which might be a bonus):

ref [5][10]float

(Extending the language is too easy. Although this might count as a
debugging aid, I'll stop there. I should really be doing boring stuff
like complex numbers and designated initialisers...)
--
bartc
Thiago Adams
2017-05-09 01:12:27 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Thiago Adams
[...]
Post by bartc
showtype expr;
which displays th type of the given expression without having wade
through 250,000 lines of diagnostics, or it might fail shortly after
anyway before the diags are generated.)
Can you implement gcc typeof using this?
I want to implement this as well for static analysis.
Since I did 'typeof', I also did 'strtype'. This turns any type into a
#include <stdio.h>
float (*a)[5][10];
int main(void) {
printf("%s\n", strtype(typeof(a)));
}
ref [5][10]float
(Extending the language is too easy. Although this might count as a
debugging aid, I'll stop there. I should really be doing boring stuff
like complex numbers and designated initialisers...)
Something I like about C, compared with the current C++ 17, it that is not necessary 10 years of work to create a parser.
And I like the idea to have some control of the tools I use.
s***@casperkitty.com
2017-05-09 14:04:46 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Something I like about C, compared with the current C++ 17, it that is not necessary 10 years of work to create a parser.
And I like the idea to have some control of the tools I use.
C was designed to be a small language which could be processed into usable
machine code by a compiler running on a minimal system. I find bizarre the
notion that programmers should be required to write code which would require
a complicated optimizer to turn into something that isn't horrible.
Ian Collins
2017-05-09 19:50:22 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Thiago Adams
Something I like about C, compared with the current C++ 17, it that is not necessary 10 years of work to create a parser.
And I like the idea to have some control of the tools I use.
C was designed to be a small language which could be processed into usable
machine code by a compiler running on a minimal system. I find bizarre the
notion that programmers should be required to write code which would require
a complicated optimizer to turn into something that isn't horrible.
Simple tools worked well on small simple systems, both hardware and
software. Tools have evolved along with the hardware. Would you really
advocate using an 80s compiler to build a kernel for a current highly
pipelined multicore CPU?
--
Ian
s***@casperkitty.com
2017-05-09 20:31:20 UTC
Reply
Permalink
Raw Message
Post by Ian Collins
Post by s***@casperkitty.com
Post by Thiago Adams
Something I like about C, compared with the current C++ 17, it that is not necessary 10 years of work to create a parser.
And I like the idea to have some control of the tools I use.
C was designed to be a small language which could be processed into usable
machine code by a compiler running on a minimal system. I find bizarre the
notion that programmers should be required to write code which would require
a complicated optimizer to turn into something that isn't horrible.
Simple tools worked well on small simple systems, both hardware and
software. Tools have evolved along with the hardware. Would you really
advocate using an 80s compiler to build a kernel for a current highly
pipelined multicore CPU?
When developing code for something like a Cortex M0 which is a lot closer
to a 68000 than to a highly-pipelined multi-core CPU, I'd say that "1990s
commonplace C" would seem pretty reasonable. Further, even on a fancy CPU,
I would suggest that compatibility and correctness should be more important
than performance, and that defining optimization directives that are suited
to newer compiler technologies would be a lot more effective than trying to
justify incompatibility with code written for 1990s compilers.
Ian Collins
2017-05-09 20:58:40 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Ian Collins
Post by s***@casperkitty.com
Post by Thiago Adams
Something I like about C, compared with the current C++ 17, it that is not necessary 10 years of work to create a parser.
And I like the idea to have some control of the tools I use.
C was designed to be a small language which could be processed into usable
machine code by a compiler running on a minimal system. I find bizarre the
notion that programmers should be required to write code which would require
a complicated optimizer to turn into something that isn't horrible.
Simple tools worked well on small simple systems, both hardware and
software. Tools have evolved along with the hardware. Would you really
advocate using an 80s compiler to build a kernel for a current highly
pipelined multicore CPU?
When developing code for something like a Cortex M0 which is a lot closer
to a 68000 than to a highly-pipelined multi-core CPU, I'd say that "1990s
commonplace C" would seem pretty reasonable.
You would only be using a basic RTOS on a Cortex M0, but I would expect
space optimisations to be very handy...
Post by s***@casperkitty.com
Further, even on a fancy CPU,
I would suggest that compatibility and correctness should be more important
than performance, and that defining optimization directives that are suited
to newer compiler technologies would be a lot more effective than trying to
justify incompatibility with code written for 1990s compilers.
How can you be incompatible with something totally unsuited to the task?
Code that was correct (in regards to instruction ordering) on an 8086
would certainly not be correct on a current Xeon.
--
Ian
s***@casperkitty.com
2017-05-09 21:25:27 UTC
Reply
Permalink
Raw Message
Post by Ian Collins
How can you be incompatible with something totally unsuited to the task?
Code that was correct (in regards to instruction ordering) on an 8086
would certainly not be correct on a current Xeon.
Why wouldn't it? Code which relies upon consistent ordering between threads
may only work if the threads have processor affinity set so they use the
same core, and memory orderings that would have been optimally efficient
on the original 80386 may be *inefficient* on the Xeon, but I am unaware
of any architectural changes that would cause OS-neutral 32-bit code running
on a single core to execute with different semantics on the Xeon than on
the original 80386 (I mention the 80386, which came out in 1985, because
Intel effectively discontinued support for 16-bit mode).
David Brown
2017-05-10 12:32:23 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Ian Collins
Post by s***@casperkitty.com
Post by Thiago Adams
Something I like about C, compared with the current C++ 17, it that is not necessary 10 years of work to create a parser.
And I like the idea to have some control of the tools I use.
C was designed to be a small language which could be processed into usable
machine code by a compiler running on a minimal system. I find bizarre the
notion that programmers should be required to write code which would require
a complicated optimizer to turn into something that isn't horrible.
Simple tools worked well on small simple systems, both hardware and
software. Tools have evolved along with the hardware. Would you really
advocate using an 80s compiler to build a kernel for a current highly
pipelined multicore CPU?
When developing code for something like a Cortex M0 which is a lot closer
to a 68000 than to a highly-pipelined multi-core CPU, I'd say that "1990s
commonplace C" would seem pretty reasonable.
Fortunately for the rest of the embedded programming world, tool vendors
disagree with you.

I have programmed on a 68000 with a 1990's compiler. (In fact, I last
modified one such program a year or so ago - going back to such old
tools was a serious shock. It was a bit like watching a film on VHS.)
I have also just recently written a small program on a M0+, using gcc.
I haven't the slightest doubt which tools I consider best, and I did not
have any issues with getting my volatiles and other accesses correct.
Post by s***@casperkitty.com
Further, even on a fancy CPU,
I would suggest that compatibility and correctness should be more important
than performance,
That sounds like a good reason for learning how C works, so that you can
write correct code, instead of fantasising about what you think old
compilers used to do.
Post by s***@casperkitty.com
and that defining optimization directives that are suited
to newer compiler technologies would be a lot more effective than trying to
justify incompatibility with code written for 1990s compilers.
In the 1990's, I wrote code that was mostly correct - using volatile
where appropriate, avoiding mistakes in aliasing, etc. I made more
mistakes then than now, since I was new to C at the start. But the
rules of C have not changed. And some of the compilers I used at that
time would optimise based on assumptions that signed integers don't
overflow, and other such thigns.
Richard Bos
2017-05-10 10:19:50 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Thiago Adams
Something I like about C, compared with the current C++ 17, it that is not necessary 10 years of work to create a parser.
And I like the idea to have some control of the tools I use.
C was designed to be a small language which could be processed into usable
machine code by a compiler running on a minimal system.
Yes, and _not_ as a portable assembler, or as a language which is
specifically tuned to a mythical "every 1980s computer".
Post by s***@casperkitty.com
I find bizarre the
notion that programmers should be required to write code which would require
a complicated optimizer to turn into something that isn't horrible.
It _is_ bizarre, which is why you'll find that the only people espousing
that notion are those who have already proven themselves unable to
understand undefined, unspecified and implementation-specified
behaviour, and the distinction between them. For most programmers, it's
quite simple.

Richard
s***@casperkitty.com
2017-05-10 16:51:43 UTC
Reply
Permalink
Raw Message
Post by Richard Bos
Post by s***@casperkitty.com
C was designed to be a small language which could be processed into usable
machine code by a compiler running on a minimal system.
Yes, and _not_ as a portable assembler, or as a language which is
specifically tuned to a mythical "every 1980s computer".
I think the 1974 C Reference Manual is probably a good indication of what
C was designed to be. Would you disagree? The PDP-11 architecture has a
number of features which are shared just about every general-purpose
microcomputer ever made, and more that are shared with every popular one
other than the 8086. The language described in the 1974 Reference Manual
allows programmers to exploit those features. Although it makes sense to
say that attempts to exploit such features are likely to fail on machines
that lack them, is there any reason to believe that C was not intended to
allow use of such features on machines where they were present?

If e.g. Ritchie had intended that programmers should not have any expectation
about integer overflow on two's-complement silent-wraparound hardware, why
do you suppose K&R1 describes the behavior of overflow as being dependent
upon the underlying platform? Were K&R misinformed about how C works?
Scott Lurndal
2017-05-10 17:37:25 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Richard Bos
Post by s***@casperkitty.com
C was designed to be a small language which could be processed into usable
machine code by a compiler running on a minimal system.
Yes, and _not_ as a portable assembler, or as a language which is
specifically tuned to a mythical "every 1980s computer".
I think the 1974 C Reference Manual is probably a good indication of what
C was designed to be. Would you disagree?
Yes I would disagree. It is simply documenting the language, as it existed, in 1974.

The fact that there were two _published_ books subsequently (the second incorporating
X3J11) should be sufficient to show that the 1974 manual wasn't anything other
than documentation for the existing compiler.
s***@casperkitty.com
2017-05-10 17:53:25 UTC
Reply
Permalink
Raw Message
Post by Scott Lurndal
Post by s***@casperkitty.com
Post by Richard Bos
Post by s***@casperkitty.com
C was designed to be a small language which could be processed into usable
machine code by a compiler running on a minimal system.
Yes, and _not_ as a portable assembler, or as a language which is
specifically tuned to a mythical "every 1980s computer".
I think the 1974 C Reference Manual is probably a good indication of what
C was designed to be. Would you disagree?
Yes I would disagree. It is simply documenting the language, as it existed, in 1974.
The fact that there were two _published_ books subsequently (the second incorporating
X3J11) should be sufficient to show that the 1974 manual wasn't anything other
than documentation for the existing compiler.
If you think K&R1 is a better guide to what C was designed to be, I could
go along with that. Does K&R1 suggest that programmers should not expect
that a compiler suitable for low-level programming on a platform that has
defined integer-overflow behavior will treat integer overflow in a fashion
consistent with the platform's defined behavior in the absence of explicit
documentation to the contrary?
Tim Rentsch
2017-05-14 19:39:23 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Richard Bos
Post by s***@casperkitty.com
C was designed to be a small language which could be processed
into usable machine code by a compiler running on a minimal
system.
Yes, and _not_ as a portable assembler, or as a language which is
specifically tuned to a mythical "every 1980s computer".
I think the 1974 C Reference Manual is probably a good indication
of what C was designed to be. Would you disagree? [...]
Yes, I do. That document shows what C was, at one point in time,
for one specific implementation (which the document itself points
out). There are some elements of C that remained mostly fixed,
but clearly the design evolved over time; one has only to look
at "The C Programming Language", published four years later, to
see that this is true. Furthermore the design of C didn't stop
in 1978 or 1980 - it continued on through the 1980's and the ANSI
standardization effort, with Dennis Ritchie's participation in
that effort. A major aspect of the ANSI standardization, and one
that Ritchie surely must have realized and played a part in, is
consideration of how C does or might work on processors other
than the few on which it was originally implemented. The idea
that C's design was complete in 1974 is laughable, not to mention
contradicted by historical evidence.
s***@casperkitty.com
2017-05-15 14:58:56 UTC
Reply
Permalink
Raw Message
Post by Tim Rentsch
A major aspect of the ANSI standardization, and one
that Ritchie surely must have realized and played a part in, is
consideration of how C does or might work on processors other
than the few on which it was originally implemented. The idea
that C's design was complete in 1974 is laughable, not to mention
contradicted by historical evidence.
When the 1974 manual was written it worked on two which had noticeably
different architectures, establishing a pattern that suggested that certain
features of the language would vary based upon the hosting CPU. I don't
think I've ever claimed that C should use two's-complement format integers
when running on, or called upon to emulate, sign-magnitude or ones'-
complement hardware.

On the other hand, I see no evidence that allowances for integer treatments
other than silent-wraparound or silent (and not necessarily consistent)
promotion were intended for any purpose other than to either facilitate
implementations on hardware whose underlying semantics were inconsistent
with those, or to allow for implementations that would affirmatively trap
integer overflow *in documented fashion*. I am aware of *ZERO* twentieth-
century evidence that anyone of note thought that quality implementations
for two's-complement hardware should not be expected to either yield some
possibly-meaningless result, trap in documented fashion, or choose in
Unspecified fashion between those two behaviors.
Richard Bos
2017-05-15 12:34:58 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Richard Bos
Post by s***@casperkitty.com
C was designed to be a small language which could be processed into usable
machine code by a compiler running on a minimal system.
Yes, and _not_ as a portable assembler, or as a language which is
specifically tuned to a mythical "every 1980s computer".
I think the 1974 C Reference Manual is probably a good indication of what
C was designed to be. Would you disagree?
Nice try, but I'm not getting into these games with you.
Post by s***@casperkitty.com
If e.g. Ritchie had intended
Stop pretending that you can read dmr's mind. It's distasteful.

Richard
s***@casperkitty.com
2017-05-15 15:02:30 UTC
Reply
Permalink
Raw Message
Post by Richard Bos
Post by s***@casperkitty.com
If e.g. Ritchie had intended
Stop pretending that you can read dmr's mind. It's distasteful.
I am claiming that what he wrote is consistent with certain intentions, and
inconsistent with others. People who write things often do so *for the
purpose* of making their intentions known, and respect for someone's writing
would require ackowledging intentions expressed thereby.

Ben Bacarisse
2017-05-08 17:24:49 UTC
Reply
Permalink
Raw Message
Thiago Adams <***@gmail.com> writes:

<snip>
(You might consider cutting some of these many many lines. You are not
commenting on all of them.)
Post by Thiago Adams
If the parser has the information of preprocessor and vice versa we can
do and check a lot of things. This is the integration.
For instance
#if sizeof(int)
preprocessor can use sizeof
#if symbol_defined(F)
preprocessor can ask if function F was defined.
A small point: I think you mean "declared".

<snip>
--
Ben.
Thiago Adams
2017-05-10 12:32:41 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
Post by David Kleinecke
Post by bartc
Post by Thiago Adams
Post by Thiago Adams
I do all preprocessing (#if, #include, macro expansion) inside the scanner.
For #if,#else etc I have a kind of state machine to tell when tokens are ignored or not.
#define A 0
1 + /*comment*/ A
The parser will ask NextToken that returns '1',
then NextToken that returns '+',
then NextToken that returns '0'.
I am planning to collect #define, #undef and comments and put then inside AST nodes.
When the AST node is created it will ask the scanner "give me all the collected comments and preprocessor" - "Clear the collected list".
So, /*comment*/ will be inserted at "primary-expression node 0".
Using this scanner I managed to rebuild source code from AST with macro.
I can put the macro call instead of the expansion in some places decided by me.
When I get a token, I can ask if that token is at the beginning of some macro expansion. I also can ask if the token is the end of the macro expansion.
#define NULL ((void*)0)
int * p = NULL;
So, when I parse the primary-expression ((void*)0) I can ask if am on the begging of some macro expansion. Token '(' is the begging of of expansion of NULL.
When the primary-expression ends, I ask if the macro expansion ended as well exactly at the end of primary-expression.
if this is true, then I replace all the primary-expression by the macro call, otherwise the macro the expansion is used.
I did this in some places.(some grammar productions)
I don´t know if someone else is interested on this subject of rebuild the source code, or preprocessor as parser detail. I also managed to keep or not #includes.I can generated the amalgamation if desired or keep the includes.External includes are always kept so the source code can be used in other platform without rebuild.
Well, it's interesting that some of this stuff is possible to do. And it
is intriguing how it might work.
So it sounds like macro-calls need to be well-formed, but how about
a +
#define a x
a + a;
The current PP rules say that this now becomes a+x+x, but a typical AST
(add2 (add1 a1 a2) a3)
where does the #define go? In the source, it's just after add1 so would
(add2 (add1 (#define 'a' 'x') a1 a2) a3)
but it's influence would apply to a2 and a3, not a1 and a2. The scope of
#defines is out of kilter with that block-scope and expression precedence.
Or do #defines, the ones to be part of the AST, also need to be properly
placed and follow similar rules to expressions and statements?
--
Bartc
The preprocessor and the parser are different modules. The
parser knows nothing about the #define, The preprocessor
knows nothing about the meaning of "+".
In other topics, I raise the question of integration of the parser and preprocessor in a way that nobody would noticed (if desired) or they would notice for good reasons.
Post by David Kleinecke
The processor can be (= usually is?) implemented on a token
by token basis incrementally. It generates a new token whenever
the parser asks for a new token. After the pre-processor has
sent "a" and "+" the next call for a token results in processing
the #define, then reading in an "a" recognizing it as a macro
name, expanding the macro and finally returning the first token
in the macro expansion. After passing all the expansion tokens
(in this case just a single "x") the pre-processor reads in
another token ("+") and then finally another macro expansion.
But then you already knew all that.
I do all processing inside the scanner.
The parser just ask for the next token.
a +
#define a x
a + a;
Next -> 'a'
Next -> '+'
Next -> 'x'
Next -> '+'
Next -> 'x'
Next -> ';'
Next -> EOF
But in the middle I can ask for the scanner for collected #define.
a +
#define a x
a + a;
Next -> 'a'
Next -> '+'
Scanner did you collect anything?
If Yes put at begging-list of the next node.
Scanner are you at the begging of macro expansion?
Next -> 'x'
Scanner are you at the end of macro expansion? If yes, and I am on
the primary-expression then I will give the node a 'expanded call'
that can be used latter to generate the macro call instead of
expansion.
Next -> '+'
//same
Next -> 'x'
//same
Next -> ';'
Next -> EOF
I had an idea about how to parse and rebuild #ifdefs.
Basically I will do like an "#include" of the TRUE path of #if and
the FALSE paths will work like comments.

#ifdef TRUE
A
#else

#endif

Something similar of

//#ifdef TRUE
#include "a.h" where a.h is A
//#else
//
//#endif

The #if will be separated in two parts.

The FALSE part will work as a big comment. It's not analyzed
but can be used to rebuild the source.
The first and second parts are inserted into begging-list
and end-list of some nodes.

If this was a code format tool for instance, the true path would be
formatted perfectly, but the false paths would be the same or formatted
with "relaxed" parser.
Thiago Adams
2017-05-06 19:04:18 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Thiago Adams
Post by Thiago Adams
I do all preprocessing (#if, #include, macro expansion) inside the scanner.
For #if,#else etc I have a kind of state machine to tell when tokens are ignored or not.
#define A 0
1 + /*comment*/ A
The parser will ask NextToken that returns '1',
then NextToken that returns '+',
then NextToken that returns '0'.
I am planning to collect #define, #undef and comments and put then inside AST nodes.
When the AST node is created it will ask the scanner "give me all the collected comments and preprocessor" - "Clear the collected list".
So, /*comment*/ will be inserted at "primary-expression node 0".
Using this scanner I managed to rebuild source code from AST with macro.
I can put the macro call instead of the expansion in some places decided by me.
When I get a token, I can ask if that token is at the beginning of some macro expansion. I also can ask if the token is the end of the macro expansion.
#define NULL ((void*)0)
int * p = NULL;
So, when I parse the primary-expression ((void*)0) I can ask if am on the begging of some macro expansion. Token '(' is the begging of of expansion of NULL.
When the primary-expression ends, I ask if the macro expansion ended as well exactly at the end of primary-expression.
if this is true, then I replace all the primary-expression by the macro call, otherwise the macro the expansion is used.
I did this in some places.(some grammar productions)
I don´t know if someone else is interested on this subject of rebuild the source code, or preprocessor as parser detail. I also managed to keep or not #includes.I can generated the amalgamation if desired or keep the includes.External includes are always kept so the source code can be used in other platform without rebuild.
Well, it's interesting that some of this stuff is possible to do. And it
is intriguing how it might work.
So it sounds like macro-calls need to be well-formed, but how about
a +
#define a x
a + a;
The current PP rules say that this now becomes a+x+x, but a typical AST
(add2 (add1 a1 a2) a3)
where does the #define go? In the source, it's just after add1 so would
(add2 (add1 (#define 'a' 'x') a1 a2) a3)
but it's influence would apply to a2 and a3, not a1 and a2. The scope of
#defines is out of kilter with that block-scope and expression precedence.
Or do #defines, the ones to be part of the AST, also need to be properly
placed and follow similar rules to expressions and statements?
--
The AST nodes (as I am doing today) didn't change.
But each node have a begin-list and end-list that can be used
to keep #define , comments, #undef that where previously collected
by the scanner.

So, in this case, the preprocessor '#define a x' can be added
at the end-list of a1 node or at the begging-list of a2 node.

a + //a1
#define a x
a + //a2
a; //a3

The decision if they will be collected or not,
or if they will generate warning or error is delegated
to parser.
One suggestion is allow it at the same places where _Static_assert can
go, but this is not a problem. Just more checks at
each grammar production. If I want to regenerated /*comments*/ I
will have to check everywhere.


When I generate code I place the begin-list before the
node and end-list after.

This sample can be re-generated as it is.

a +
#define a x
a + a;


When the primary-expression a (a2) is the current token 'x' the
parser will understand that this is the begging of the expansion of
macro 'a'. When the the token '+' is the current token the
parser will know that the expansion of 'a' ended.

But it ended at the exact point where the primary expression
ended. So I can decide to replace that primary expression by
the macro call or do nothing.

This one

#define X a +
int main()
{
int a;
X 1;
}


Will generate

#define X a +
int main()
{
int a;
a + 1;
}

Because the macro expansion of X didn't ended at the
same point of primary-expression. I can decide where to
put these rules. My current rule is inside the
primary-expression and initializers.
For my personal use, I don´t want to allow this kind of macro
expansion or I don´t care if the generated code is not similar.

For the keywords I need a similar decision. I have to decide if
I will keep the macro or keyword. The good sample for this
is bool.


I am not checking anything at inner macro expansions.

#define NULL ((void*)0)
#define X 1 + NULL
int main()
{
int a;
a + X;
}

is expanded to

int main()
{
int a;

//results (( void*)0) instead of NULL because NULL
//is not recognized at inner expansions

a+1+(( void*)0);
}

Changing
#define X 1 + NULL
To
#define X (1 + NULL)

generates:

a+X;

because now
#define X (1 + NULL)
works as a primary-expression. The inner expansions are not relevant.

In my code the first macro expansion calls other algorithm
that does all the inner expansions and returns a string.
This is string is pushed to the scanner similar of
one #include.
s***@casperkitty.com
2017-04-28 16:53:31 UTC
Reply
Permalink
Raw Message
Post by Thiago Adams
There is one C++ compiler (comeau), apart of CFront, that generates C code.
I am curious about the output. How they managed the generation etc.., but
I can't find samples, our trial download etc.
Do any such compilers make any effort to ensure defined behavior in cases
which it is required by the C++ Standard but not the C Standard, but where
most C compilers handle things as needed even when not required to do so?
David Brown
2017-04-28 17:47:06 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Thiago Adams
There is one C++ compiler (comeau), apart of CFront, that generates C code.
I am curious about the output. How they managed the generation etc.., but
I can't find samples, our trial download etc.
Do any such compilers make any effort to ensure defined behavior in cases
which it is required by the C++ Standard but not the C Standard, but where
most C compilers handle things as needed even when not required to do so?
What cases are these? I can't think off-hand of any situation where you
can write something that is valid as C and C++, has basically the same
meaning, but is fully defined by the C++ standards and not by the C
standards.
s***@casperkitty.com
2017-04-28 18:56:38 UTC
Reply
Permalink
Raw Message
Post by David Brown
What cases are these? I can't think off-hand of any situation where you
can write something that is valid as C and C++, has basically the same
meaning, but is fully defined by the C++ standards and not by the C
standards.
How about

int shiftXY(int x, int y) { return x << y; }

The current C++ Standard defines shiftXY(1,31) as being equivalent to
(int)(1u << 31). I don't like that change (I think the result should
have made the behavior of the shift Implementation-Defined, independent
of the behavior of unsigned-to-signed conversion), but the present
Standard(*) clearly and deliberately defines the meaning of that code in
a case where the C Standard does not.

(*) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf
See section 5.8 paragraph 2.
Keith Thompson
2017-04-28 19:21:44 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Brown
What cases are these? I can't think off-hand of any situation where you
can write something that is valid as C and C++, has basically the same
meaning, but is fully defined by the C++ standards and not by the C
standards.
How about
int shiftXY(int x, int y) { return x << y; }
The current C++ Standard defines shiftXY(1,31) as being equivalent to
(int)(1u << 31).
I fail to see how wrapping it in a function made it any clearer, but OK.
Post by s***@casperkitty.com
I don't like that change (I think the result should
have made the behavior of the shift Implementation-Defined, independent
of the behavior of unsigned-to-signed conversion), but the present
Standard(*) clearly and deliberately defines the meaning of that code in
a case where the C Standard does not.
(*) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf
See section 5.8 paragraph 2.
You're right. That change was made after the ISO C++11 standard,
and it still appears in the draft C++17 standard (N4296). I was
mistaken when I previously stated otherwise. Apparently it also
appears in C++14 (I don't have a copy).

Note that the result is implementation-defined, because the result of
the conversion is implementation-defined. (It's undefined behavior
in C.)

Here's the wording from the C++11 standard, discussing E1 << E2
(changing some of the punctuation to avoid non-ASCII characters):

Otherwise, if E1 has a signed type and non-negative value,
and E1 * 2**E2 is representable in the result type, then that
is the resulting value; otherwise, the behavior is undefined.

The N4296 draft of C++17 says:

Otherwise, if E1 has a signed type and non-negative value, and
E1 * 2**E2 is representable in the corresponding unsigned type of
the result type, then that value, converted to the result type,
is the resulting value; otherwise, the behavior is undefined.

I don't know why that change was made.

To answer your original question, a C++ implementation that uses
C intermediate code must do whatever is necessary to ensure that
it behaves in accordance with whichever C++ standard it claims to
support. That might or might not involve some extra work for the
"<<" operator with a signed left operand. I have no idea what any
such implementations actually do.

(BTW, thank you for posting with shorter lines.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-04-28 20:07:32 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
int shiftXY(int x, int y) { return x << y; }
The current C++ Standard defines shiftXY(1,31) as being equivalent to
(int)(1u << 31).
I fail to see how wrapping it in a function made it any clearer, but OK.
It makes clear that x and y are values of type "int" that may or may not
be constants.
Post by Keith Thompson
Post by s***@casperkitty.com
I don't like that change (I think the result should
have made the behavior of the shift Implementation-Defined, independent
of the behavior of unsigned-to-signed conversion), but the present
Standard(*) clearly and deliberately defines the meaning of that code in
a case where the C Standard does not.
(*) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf
See section 5.8 paragraph 2.
You're right. That change was made after the ISO C++11 standard,
and it still appears in the draft C++17 standard (N4296). I was
mistaken when I previously stated otherwise. Apparently it also
appears in C++14 (I don't have a copy).
Note that the result is implementation-defined, because the result of
the conversion is implementation-defined. (It's undefined behavior
in C.)
The behavior is defined as yielding an Implementation-Defined value. On
a two's-complement system, such behavior would make sense, but on a
36-bit ones'-complement system it really doesn't make sense to say that
a template substitution which would produce 1<<35 must be considered valid
even though it's unlikely the resulting value would be useful.

Rather than trying to bodge a particular spot where some implementations
would define a useful behavior but were forbidden from doing so, a better
approach would be to split UB into two subcategories: actions for which
many implementations should define meaningful behaviors, but where some
implementations may be unable to do so, versus actions for which no
meaningful behavior would be defined on any platform, and allow expressions
of the former type to be used in compile-time constants on platforms that
define their meaning.
Post by Keith Thompson
Here's the wording from the C++11 standard, discussing E1 << E2
Otherwise, if E1 has a signed type and non-negative value,
and E1 * 2**E2 is representable in the result type, then that
is the resulting value; otherwise, the behavior is undefined.
Otherwise, if E1 has a signed type and non-negative value, and
E1 * 2**E2 is representable in the corresponding unsigned type of
the result type, then that value, converted to the result type,
is the resulting value; otherwise, the behavior is undefined.
I don't know why that change was made.
From what I understand, the change was made because compilers were
expressly forbidden from accepting any expression that invoked Undefined
Behavior in any context requiring a compile-time constant. In C++, there
is a construct which means, "try interpreting the following code with
various type substitutions until one is found that works"; it would be
of limited usefulness if a substitution that could launch missiles even
before any user code executes could be chosen in preference to one which
behaved in defined fashion.
David Brown
2017-04-28 20:23:12 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
Post by s***@casperkitty.com
Post by David Brown
What cases are these? I can't think off-hand of any situation where you
can write something that is valid as C and C++, has basically the same
meaning, but is fully defined by the C++ standards and not by the C
standards.
How about
int shiftXY(int x, int y) { return x << y; }
The current C++ Standard defines shiftXY(1,31) as being equivalent to
(int)(1u << 31).
I fail to see how wrapping it in a function made it any clearer, but OK.
Post by s***@casperkitty.com
I don't like that change (I think the result should
have made the behavior of the shift Implementation-Defined, independent
of the behavior of unsigned-to-signed conversion), but the present
Standard(*) clearly and deliberately defines the meaning of that code in
a case where the C Standard does not.
(*) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf
See section 5.8 paragraph 2.
You're right. That change was made after the ISO C++11 standard,
and it still appears in the draft C++17 standard (N4296). I was
mistaken when I previously stated otherwise. Apparently it also
appears in C++14 (I don't have a copy).
Note that the result is implementation-defined, because the result of
the conversion is implementation-defined. (It's undefined behavior
in C.)
Here's the wording from the C++11 standard, discussing E1 << E2
Otherwise, if E1 has a signed type and non-negative value,
and E1 * 2**E2 is representable in the result type, then that
is the resulting value; otherwise, the behavior is undefined.
Otherwise, if E1 has a signed type and non-negative value, and
E1 * 2**E2 is representable in the corresponding unsigned type of
the result type, then that value, converted to the result type,
is the resulting value; otherwise, the behavior is undefined.
I don't know why that change was made.
I'd been looking at the C++11 standard - it had not occurred to me that
there would be such a change between standard versions.
Post by Keith Thompson
To answer your original question, a C++ implementation that uses
C intermediate code must do whatever is necessary to ensure that
it behaves in accordance with whichever C++ standard it claims to
support. That might or might not involve some extra work for the
"<<" operator with a signed left operand. I have no idea what any
such implementations actually do.
The results of some bitwise operations on signed integers (C90 6.3,
C99 and C11 6.5).
Bitwise operators act on the representation of the value including
both the sign and value bits, where the sign bit is considered
immediately above the highest-value value bit. Signed ‘>>’ acts on
negative numbers by sign extension.
As an extension to the C language, GCC does not use the latitude
given in C99 and C11 only to treat certain aspects of signed ‘<<’ as
undefined. However, -fsanitize=shift (and -fsanitize=undefined) will
diagnose such cases. They are also diagnosed where constant
expressions are required.
That last paragraph does not sound very clear to me. Did the behaviour
of signed << change between C90 and C99 ?
s***@casperkitty.com
2017-04-28 21:03:32 UTC
Reply
Permalink
Raw Message
Post by David Brown
That last paragraph does not sound very clear to me. Did the behaviour
of signed << change between C90 and C99 ?
In C89, the behavior was defined in terms of physical bit shifting, regardless
of how integers were stored. Thus, the expression -4 << 1 would yield

-8 on a two's-complement machine (111...1100 << 1 yields 111...11000)
-9 on a ones'-complement machine (111...1011 << 1 yields 111...10110)
+8 on a ones'-complement machine (100...0100 << 1 yields 000...01000)

Depending upon what one is trying to do with a particular implementation,
it may be more helpful to have a left-shift do one of the above (not
necessarily the one associated with the hardware platform) or select
arbitrarily between them (e.g. on machines where left-shifting a register
by one is sometimes slower than adding a number to itself, but is sometimes
faster, it may be helpful to let a compiler do whichever is more efficient
in any given situation). When porting code among systems, it may also be
useful to have an option to trap on situations where behaviors could differ.

Short of introducing a new category of behavior, the only way the C99
Standard could allow implementations the freedom to behave in those
various *useful* fashions was to treat left shifts of negative numbers
as Undefined Behavior.
Keith Thompson
2017-04-28 21:57:47 UTC
Reply
Permalink
Raw Message
[...]
Post by David Brown
Post by Keith Thompson
Here's the wording from the C++11 standard, discussing E1 << E2
Otherwise, if E1 has a signed type and non-negative value,
and E1 * 2**E2 is representable in the result type, then that
is the resulting value; otherwise, the behavior is undefined.
Otherwise, if E1 has a signed type and non-negative value, and
E1 * 2**E2 is representable in the corresponding unsigned type of
the result type, then that value, converted to the result type,
is the resulting value; otherwise, the behavior is undefined.
I don't know why that change was made.
I'd been looking at the C++11 standard - it had not occurred to me that
there would be such a change between standard versions.
I found the C++ DR that triggered the change.

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1457

The rationale given in the DR is:

As a result, this technique cannot be used in a constant expression,
which will break a significant amount of code.
Post by David Brown
Post by Keith Thompson
To answer your original question, a C++ implementation that uses
C intermediate code must do whatever is necessary to ensure that
it behaves in accordance with whichever C++ standard it claims to
support. That might or might not involve some extra work for the
"<<" operator with a signed left operand. I have no idea what any
such implementations actually do.
The results of some bitwise operations on signed integers (C90 6.3,
C99 and C11 6.5).
Bitwise operators act on the representation of the value including
both the sign and value bits, where the sign bit is considered
immediately above the highest-value value bit. Signed ">>" acts on
negative numbers by sign extension.
As an extension to the C language, GCC does not use the latitude
given in C99 and C11 only to treat certain aspects of signed "<<" as
undefined. However, -fsanitize=shift (and -fsanitize=undefined) will
diagnose such cases. They are also diagnosed where constant
expressions are required.
That last paragraph does not sound very clear to me. Did the behaviour
of signed << change between C90 and C99 ?
C90 says:

The result of E1 << E2 is E1 left-shifted E2 bit positions;
vacated bits are filled with zeros. If E1 has an unsigned
type, the value of the result is E1 multiplied by the quantity,
2 raised to the power E2, reduced modulo ULONG_MAX+1 if E1
has type unsigned long, UINT-MAX+1 otherwise. (The constants
ULONG_MAX and UINT_MAX are defined in the header <limits.h>.)

C99 says (and C11 is identical) (I've changed some of the punctuation
to avoid non-ASCII characters):

The result of E1 << E2 is E1 left-shifted E2 bit positions;
vacated bits are filled with zeros. If E1 has an unsigned type,
the value of the result is E1 * 2^E2, reduced modulo one more
than the maximum value representable in the result type. If
E1 has a signed type and nonnegative value, and E1 * 2^E2 is
representable in the result type, then that is the resulting
value; otherwise, the behavior is undefined.

The wording for the maximum value of an unsigned type was made more
general due to C99's addition of long long, and the final sentence:

If E1 has a signed type and nonnegative value, and E1 * 2^E2 is
representable in the result type, then that is the resulting value;
otherwise, the behavior is undefined.

was added by C99. It's difficult to say what C90's intent was when E1
is negative. I'd say that sentence was added precisely because the C90
wording was unclear.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
s***@casperkitty.com
2017-04-28 22:57:15 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
was added by C99. It's difficult to say what C90's intent was when E1
is negative. I'd say that sentence was added precisely because the C90
wording was unclear.
The meaning of "shift left" could be ambiguous on a sign-magnitude machine,
but the statement that shifted-in bits are zeroes eliminates any ambiguity
about how ones'-complement machines should behave [instead mandating what
for many purposes would be the less useful behavior]. In two's-complement
systems, the behavior of a logical and arithmetic left shift are equivalent,
so there would be no need to resolve ambiguities on those systems except in
cases where they are configured to do something other than use normal two's-
complement semantics.
Keith Thompson
2017-04-28 18:49:43 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by Thiago Adams
There is one C++ compiler (comeau), apart of CFront, that generates C code.
I am curious about the output. How they managed the generation etc.., but
I can't find samples, our trial download etc.
Do any such compilers make any effort to ensure defined behavior in cases
which it is required by the C++ Standard but not the C Standard, but where
most C compilers handle things as needed even when not required to do so?
Do you have an example of such a case?

I don't know the answer to your question, but a C++ implementation that
generates C intermediate code obviously must conform to the C++
standard. It can do so by generating portable C code (which, if the
cases you describe actually exist, might be some extra effort), or by
relying on the known behavior of the C compiler. That might include
passing additional command-line arguments.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
jacobnavia
2017-04-26 11:31:30 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Ben Bacarisse
Post by bartc
(It's still a puzzle why I didn't see the clash on my Linux.)
Didn't "man strmode" solve the puzzle for you? (I'd have written
strmode(3) except that notation seems to be almost unknown these days.)
Well, the answer is that 'strmode()' doesn't exist on my Linux.
Otherwise I would have noticed the clash in my tests and done something
about it.
(The puzzle then becomes why strmode() isn't there; maybe it's only on
FreeBSD or something because I first tried 'man strmode' online and it
was on a site for FreeBSD. I think jacob uses a Mac so that could be
another difference.)
Yes, it is absent in linux but present in my mac, OS X 10.11.6
in /usr/include/string.h

I would just suggest that you replace strmode by Strmode, and be done
with it.
GOTHIER Nathan
2017-04-26 13:14:09 UTC
Reply
Permalink
Raw Message
On Wed, 26 Apr 2017 13:31:30 +0200
Post by jacobnavia
I would just suggest that you replace strmode by Strmode, and be done
with it.
I think renaming his strmode function as Strmode create confusion. I would
suggest to rather use an unambiguous name such as strstat, strset, ... or to
treat the BSD conflict as a buggy case warning the programmer of the
incompatibility if you really like the name.
bartc
2017-04-26 14:10:45 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 26 Apr 2017 13:31:30 +0200
Post by jacobnavia
I would just suggest that you replace strmode by Strmode, and be done
with it.
I think renaming his strmode function as Strmode create confusion.
I doubt it:

(1) Few are going to read the source

(2) That function is do with debugging and is not an essential one (in
understanding the code)

(3) Probably few have heard of 'strmode', if it exists on neither
Windows nor Linux

(4) Since C is case-sensitive, there can be no confusion between Str and
str.

What /is/ confusing is the plethora of string-handling routines starting
with str- or based around str- functions. Especially coming out of
Microsoft.
Post by GOTHIER Nathan
I would
suggest to rather use an unambiguous name such as strstat, strset, ... or to
treat the BSD conflict as a buggy case warning the programmer of the
incompatibility if you really like the name.
I often use the str- prefix for stringifying routines. I rarely work in
mixed-case so would not want to use Str (although I have done in this
instance). I don't like that the language has a monopoly on names
starting with ordinary letters of the alphabet.
--
bartc
GOTHIER Nathan
2017-04-26 14:36:55 UTC
Reply
Permalink
Raw Message
On Wed, 26 Apr 2017 15:10:45 +0100
Post by bartc
(1) Few are going to read the source
(2) That function is do with debugging and is not an essential one (in
understanding the code)
(3) Probably few have heard of 'strmode', if it exists on neither
Windows nor Linux
Actually I don't see the point of using the str prefix for a private string
function. Indeed, the Str prefix should be enough to distinguish your function
from the BSD one like mystring prefix for example.
Post by bartc
(4) Since C is case-sensitive, there can be no confusion between Str and
str.
Nevertheless the human brain reading interpolation is pretty case insensitive.
Post by bartc
What /is/ confusing is the plethora of string-handling routines starting
with str- or based around str- functions. Especially coming out of
Microsoft.
I agree the C standard library shouldn't be as bloated as the STL.
Malcolm McLean
2017-04-26 15:49:44 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
Post by bartc
What /is/ confusing is the plethora of string-handling routines starting
with str- or based around str- functions. Especially coming out of
Microsoft.
I agree the C standard library shouldn't be as bloated as the STL.
C is not going to be the language of choice for string handling, except as implementation
language for a sophisticated library such as suffix tree strings.
David Brown
2017-04-26 18:44:31 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 26 Apr 2017 15:10:45 +0100
Post by bartc
(1) Few are going to read the source
(2) That function is do with debugging and is not an essential one (in
understanding the code)
(3) Probably few have heard of 'strmode', if it exists on neither
Windows nor Linux
Actually I don't see the point of using the str prefix for a private string
function. Indeed, the Str prefix should be enough to distinguish your function
from the BSD one like mystring prefix for example.
Post by bartc
(4) Since C is case-sensitive, there can be no confusion between Str and
str.
Nevertheless the human brain reading interpolation is pretty case insensitive.
You are contradicting yourself. First you say using "Str" instead of
"str" should be enough to distinguish the functions, then you say that
they are hard to distinguish for humans.

Personally, I find it quite easy to distinguish upper and lower cases
when reading function names. But some people find it harder - it
depends on how much you read by the visual shape of a word, and how much
by the sound of the word.
Post by GOTHIER Nathan
Post by bartc
What /is/ confusing is the plethora of string-handling routines starting
with str- or based around str- functions. Especially coming out of
Microsoft.
I agree the C standard library shouldn't be as bloated as the STL.
How can you "agree" with something that was never mentioned? No one is
complaining about the str... functions in the C standard library - the
complaint is about /additional/ functions in C library implementations.
The BSD C library has a number of extra str... functions, which can
quickly be confusing until you are familiar with them. MS's libraries
have many more, and seem to regularly deprecate existing functions as
"insecure" or "unsafe", and add new ones - only to find that these too
have flaws and need replacing.

The STL is C++, and is a totally different beast. Whether or not it is
"bloated", rather than "feature-filled", is a matter of opinion. But it
does not suffer from the name clashes of C libraries to nearly the same
extent, because C++ has hierarchical naming (through classes and
namespaces).
GOTHIER Nathan
2017-04-26 19:02:53 UTC
Reply
Permalink
Raw Message
On Wed, 26 Apr 2017 20:44:31 +0200
Post by David Brown
Personally, I find it quite easy to distinguish upper and lower cases
when reading function names. But some people find it harder - it
depends on how much you read by the visual shape of a word, and how much
by the sound of the word.
Please go shag an other fly... thanks.

PS: I'm not interested in your C++ craps.
David Brown
2017-04-26 20:07:07 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 26 Apr 2017 20:44:31 +0200
Post by David Brown
Personally, I find it quite easy to distinguish upper and lower cases
when reading function names. But some people find it harder - it
depends on how much you read by the visual shape of a word, and how much
by the sound of the word.
Please go shag an other fly... thanks.
?
Post by GOTHIER Nathan
PS: I'm not interested in your C++ craps.
/You/ brought up C++, not me. Or perhaps you don't know what the STL
is, even though you are happy to complain about it?
GOTHIER Nathan
2017-04-26 20:56:45 UTC
Reply
Permalink
Raw Message
On Wed, 26 Apr 2017 22:07:07 +0200
Post by David Brown
/You/ brought up C++, not me. Or perhaps you don't know what the STL
is, even though you are happy to complain about it?
And you bring it back because I hit your honor as a C++ guru... then you're
reacting like a child putting fault on others and shagging flies to make you
great (again). That's so pathetic!
David Brown
2017-04-27 09:16:14 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 26 Apr 2017 22:07:07 +0200
Post by David Brown
/You/ brought up C++, not me. Or perhaps you don't know what the STL
is, even though you are happy to complain about it?
And you bring it back because I hit your honor as a C++ guru... then you're
reacting like a child putting fault on others and shagging flies to make you
great (again). That's so pathetic!
Eh, no. The great majority of my work is C programming, not C++. And
when I /do/ code in C++, I rarely use the STL.

You made a stupid, ignorant and out-of-context remark - I said why it
was wrong. That really should have been the end of it, but you seem to
be unable to avoid posting meaningless insults.

No doubt you will follow this up with more of the same, but I will try
my best to ignore it.
Richard Heathfield
2017-04-27 09:53:53 UTC
Reply
Permalink
Raw Message
<something silly>
Post by David Brown
You made a stupid, ignorant and out-of-context remark - I said why it
was wrong. That really should have been the end of it, but you seem to
be unable to avoid posting meaningless insults.
No doubt you will follow this up with more of the same, but I will try
my best to ignore it.
David, that task is much easier than you are making out. I've been
ignoring everything he writes for some weeks now, and it takes no effort
at all!
--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
David Brown
2017-04-27 10:56:36 UTC
Reply
Permalink
Raw Message
Post by Richard Heathfield
<something silly>
Post by David Brown
You made a stupid, ignorant and out-of-context remark - I said why it
was wrong. That really should have been the end of it, but you seem to
be unable to avoid posting meaningless insults.
No doubt you will follow this up with more of the same, but I will try
my best to ignore it.
David, that task is much easier than you are making out. I've been
ignoring everything he writes for some weeks now, and it takes no effort
at all!
I have ignored most (all?) of his posts before these past few - but
those posts were not directed at me. And I know I have been weak-willed
in the past, and risen to bait.
Rick C. Hodgin
2017-04-27 11:13:38 UTC
Reply
Permalink
Raw Message
Bart ... well done on your project thus far. Impressive.

Thank you,
Rick C. Hodgin
bartc
2017-04-27 11:27:59 UTC
Reply
Permalink
Raw Message
Post by Rick C. Hodgin
Bart ... well done on your project thus far. Impressive.
Er, thanks... but when it can compile a few more programs, it'll be better!
--
bartc
Rick C. Hodgin
2017-04-27 13:05:06 UTC
Reply
Permalink
Raw Message
Post by bartc
Post by Rick C. Hodgin
Bart ... well done on your project thus far. Impressive.
Er, thanks... but when it can compile a few more programs, it'll be better!
Undoubtedly. Nonetheless, my hat goes off to you, sir. It's a man's
job (so to speak), and you've borne it well to date.

If I may now be so bold as to continue the Vader quote I originated
above, though without the intervening light sabre strike:

"Impressive. Most impressive."


:-)

Thank you,
Rick C. Hodgin
Ian Collins
2017-04-26 21:38:37 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
PS: I'm not interested in your C++ craps.
Craps, a dice game, could be programmed in C, C++ or any number of
languages...
--
Ian
David Kleinecke
2017-04-27 00:40:04 UTC
Reply
Permalink
Raw Message
Post by Ian Collins
Post by GOTHIER Nathan
PS: I'm not interested in your C++ craps.
Craps, a dice game, could be programmed in C, C++ or any number of
languages...
Long, long ago the first thing programmed in original BASIC
Geoff
2017-04-27 05:22:51 UTC
Reply
Permalink
Raw Message
Post by bartc
(3) Probably few have heard of 'strmode', if it exists on neither
Windows nor Linux
Typing man strmode on OS X reveals that this function originated in
4.4BSD and the documentation dates from July 28, 1994 so it's quite
possibly as old as if not older than Linux. OS X inherits pedigree
from BSD 4.3 (1986) via NeXTStep which pre-dates Linux and it inherits
again via FreeBSD and 4.4BSD (1992-94). Just because Linux doesn't
have something doesn't mean it can't exist.

Including <string.h> in your compiler obliges you to avoid name
conflicts. If you're extending the library and expect it to be
portable it seems to me you need to research the name space or compile
it yourself on platforms with which you seek compatibility.
Malcolm McLean
2017-04-27 08:45:11 UTC
Reply
Permalink
Raw Message
Post by Geoff
Including <string.h> in your compiler obliges you to avoid name
conflicts. If you're extending the library and expect it to be
portable it seems to me you need to research the name space or compile
it yourself on platforms with which you seek compatibility.
I think the issue is that since the program produces C source, the names of
functions have to be derived from identifiers in the source language.
If rules for implementation-reserved names are complicated, that
makes the process quite difficult. Especially if prefixes like "str" or "mem"
are attractive to programmers in the source language.
Geoff
2017-04-27 04:55:32 UTC
Reply
Permalink
Raw Message
Post by bartc
In this case it's because the translator from a source language (not C)
to C has to be aware of C's rather bizarre restriction that the prefix
'str-' is reserved and can't be used for user identifiers. (So 'weak' is
OK, but not 'strong'; presumably not even 'string' is legal.)
This is not true. Standard C doesn't reserve 'str-' as you describe
but when creating 'str-' functions in your implementation it does put
the responsibility on you to avoid name collisions. This means
researching the topic thoroughly to be sure you're not going to create
conflicts. (The perceived reservation is due to the extensive use by
the standard library of 'str-' functions so the namespace is
relatively crowded unless you like names like strmodality() for your
library extensions.)

The lazy way out is to use the name space reserved for the
implementation and use the leading underscore for the purpose it was
intended. Most implementations conforming to the current standard are
using _strsomething for functions that extend the string function
library.
David Brown
2017-04-27 09:25:45 UTC
Reply
Permalink
Raw Message
Post by Geoff
Post by bartc
In this case it's because the translator from a source language (not C)
to C has to be aware of C's rather bizarre restriction that the prefix
'str-' is reserved and can't be used for user identifiers. (So 'weak' is
OK, but not 'strong'; presumably not even 'string' is legal.)
This is not true. Standard C doesn't reserve 'str-' as you describe
but when creating 'str-' functions in your implementation it does put
the responsibility on you to avoid name collisions.
From §7.1.3, "Each macro name in any of the following subclauses
(including the future library directions) is reserved for use as
specified if any of its associated headers is included, unless
explicitly stated otherwise".

Thus "str-" identifiers /are/ reserved, /if/ you use the <stdlib.h> or
<string.h> headers in your code.

If you avoid these headers, you can use str- identifiers as you like.

However, you /may/ have to use compiler flags to avoid compiler builtins
or other optimisations for existing str- library functions, should you
decide to write your own "strcat" or "strlen" functions with slightly
different semantics from the standard functions.

And I don't know what rules apply if you include <string.h> in one
translation unit, and in another unit you don't use the include file but
define your own strfoo() function with external linkage.
Post by Geoff
This means
researching the topic thoroughly to be sure you're not going to create
conflicts. (The perceived reservation is due to the extensive use by
the standard library of 'str-' functions so the namespace is
relatively crowded unless you like names like strmodality() for your
library extensions.)
The lazy way out is to use the name space reserved for the
implementation and use the leading underscore for the purpose it was
intended. Most implementations conforming to the current standard are
using _strsomething for functions that extend the string function
library.
James R. Kuyper
2017-04-27 16:59:49 UTC
Reply
Permalink
Raw Message
Post by David Brown
Post by Geoff
Post by bartc
In this case it's because the translator from a source language (not C)
to C has to be aware of C's rather bizarre restriction that the prefix
'str-' is reserved and can't be used for user identifiers. (So 'weak' is
Note that this "bizarre" restriction is intended to allow future
versions of the standard to add functions with those names, and for
implementors of the C standard library to add such functions even before
they've been standardized. In general, if anything unexpected happens
due to your use of such an identifier, it will happen because the
identifier you are using is already in use by the C library for some
other, conflicting purpose. I'm not sure why BartC considers that so
bizarre.
Post by David Brown
Post by Geoff
Post by bartc
OK, but not 'strong'; presumably not even 'string' is legal.)
Correct - they both fit the pattern of "str" followed by a lowercase
letter, as described below.
Post by David Brown
Post by Geoff
This is not true. Standard C doesn't reserve 'str-' as you describe
but when creating 'str-' functions in your implementation it does put
the responsibility on you to avoid name collisions.
From §7.1.3, "Each macro name in any of the following subclauses
(including the future library directions) is reserved for use as
specified if any of its associated headers is included, unless
explicitly stated otherwise".
The relevant parts of "Future library Directions" are 7.31.12: "Function
names that begin with str and a lowercase letter may be added to the
declarations in the <stdlib.h> header." and 7.31.13: "Function names
that begin with str, mem, or wcs and a lowercase letter may be added to
the declarations in the <string.h> header." Since those are function
names, which might have external linkage, and not macros, the relevant
clause is actually the one immediately after the one you cited above:

"All identifiers with external linkage in any of the following
subclauses (including the future library directions) and errno are
always reserved for use as identifiers with external linkage." (7.1.3p1)

Note that those identifiers are "always reserved", and not just when you
#include the relevant header.

Of course, "Any function declared in a header may be additionally
implemented as a function-like macro defined in the header," (7.1.4p1),
so what you said about macros does apply if you #include the relevant
header.
GOTHIER Nathan
2017-04-27 13:03:23 UTC
Reply
Permalink
Raw Message
On Wed, 26 Apr 2017 21:55:32 -0700
Post by Geoff
The lazy way out is to use the name space reserved for the
implementation and use the leading underscore for the purpose it was
intended. Most implementations conforming to the current standard are
using _strsomething for functions that extend the string function
library.
Most C library implementations use the leading underscore only because the name
space is reserved for the C library implementation. Avoiding name collision
from several C library implementation may be a harder task than googling the
specific name with the leading underscore.

As a result, any C programmer should avoid the leading underscore if not for
implementing the C standard library.

For any private use, I would recommend to add a specific prefix bound to the
project such as myproject_ which the C standard (K&R) allow at least 31
significant leading characters to define the full name.
Scott Lurndal
2017-04-27 13:27:32 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Wed, 26 Apr 2017 21:55:32 -0700
Post by Geoff
The lazy way out is to use the name space reserved for the
implementation and use the leading underscore for the purpose it was
intended. Most implementations conforming to the current standard are
using _strsomething for functions that extend the string function
library.
Most C library implementations use the leading underscore only because the name
space is reserved for the C library implementation. Avoiding name collision
from several C library implementation may be a harder task than googling the
specific name with the leading underscore.
You're conflating applications with the implementation. Bart is creating
a new implementation of C, so he is allowed to use the leading underscore
in his implementation.
Post by GOTHIER Nathan
As a result, any C programmer should avoid the leading underscore if not for
implementing the C standard library.
Which is exactly what Bart is doing.
GOTHIER Nathan
2017-04-25 23:29:42 UTC
Reply
Permalink
Raw Message
On Tue, 25 Apr 2017 14:46:05 GMT
Post by Scott Lurndal
If you're adding functions to the implementation that aren't defined
by one of the relevent standards, should you not be prepending at least
one underscore to the name? Otherwise, you'll likely break existing
programs (as per above).
Reserved names doesn't mean you should waste an entire name space for the need
of an unrealized future. Since the C standard provides a list of reserved names,
it means the implementor or the programmer shouldn't define function names which
could make conflicts against the C standard.

Actually I think the C standard is wrong in reserving name spaces such as str*,
mem*, ... because it prevents the implementor or the programmer to use relevant
names and consequently extending the de-facto standard with consistency.
Philip Lantz
2017-04-26 17:14:34 UTC
Reply
Permalink
Raw Message
Post by Scott Lurndal
Post by bartc
Post by jacobnavia
Hi Bart
Got your compiler, looks impressive. Tried to compile it and got a
problem with
bartcc.c:1236:15: error: conflicting types for 'strmode'
char * strmode (int32,int32);
^
/usr/include/string.h:164:7: note: previous declaration is here
void strmode(int, char *);
What is that strmode?
I did not know that function.
OK, strmode() is one of my functions (convert a type, or 'mode', into
string). Presumably it clashes with a C library function called
'strmode' (although I didn't see the problem with Windows compilers, nor
gcc on Linux).
If you're adding functions to the implementation that aren't defined
by one of the relevent standards, should you not be prepending at least
one underscore to the name? Otherwise, you'll likely break existing
programs (as per above).
The name strmode is reserved to the implementation, so the implementation is
free to add a function with that name. It is the program's use of the name
that causes the conflict.

Since the C source wasn't written by a programmer, but was generated by a
compiler, it is the compiler's responsibility to avoid such conflicts. The
compiler that is generating C code as output should add a prefix to all names
in its C output to avoid problems like this. (Note that a simple '_' prefix
won't do, because if a symbol in the original source already starts with '_'
or a capital letter, then that would change it into a reserved name.)
s***@casperkitty.com
2017-04-26 19:25:32 UTC
Reply
Permalink
Raw Message
Post by Philip Lantz
Since the C source wasn't written by a programmer, but was generated by a
compiler, it is the compiler's responsibility to avoid such conflicts. The
compiler that is generating C code as output should add a prefix to all names
in its C output to avoid problems like this. (Note that a simple '_' prefix
won't do, because if a symbol in the original source already starts with '_'
or a capital letter, then that would change it into a reserved name.)
There are two approaches a compiler can take:

1. Adjust any name which would match an implementation-reserved form, and
then require some special syntax in cases where user code knows of an
implementation-provided symbol that has a reserved name, and wishes to
use the meaning attached by the implementation.

2. Use whatever names programmers provide, but indicate that use of some
names may cause difficulties.

Note that the Standard was written at a time when implementers could be
relied upon to use common sense. While an implementation might be allowed
to break any code that uses "stretch" as an identifier (since it starts
with "str"), I think the expectation was that implementations would avoid
breaking code needlessly whether the Standard required it or not. Since
existing implementations' libraries already defined some str* functions,
forbidding libraries from including functions with those names would have
broken a lot of existing code. That does not imply, however, that functions
which are added in future shouldn't be defined in a new header, and use a
pattern like:

#define strnewthing __strnewthing

so that only code which includes the header would have to worry about a
naming conflict. Existing code, which would have had no reason to include
a header that didn't exist when it was written, would have been able to
use the name without difficulty.
Keith Thompson
2017-04-24 22:26:00 UTC
Reply
Permalink
Raw Message
Post by jacobnavia
Got your compiler, looks impressive. Tried to compile it and got a
problem with
bartcc.c:1236:15: error: conflicting types for 'strmode'
char * strmode (int32,int32);
^
/usr/include/string.h:164:7: note: previous declaration is here
void strmode(int, char *);
What is that strmode?
I did not know that function.
It's an implementation-defined function from BSD. The man page on my
system says:

#include <bsd/string.h>

void
strmode(mode_t mode, char *bp);

https://www.freebsd.org/cgi/man.cgi?query=strmode&sektion=3&apropos=0&manpath=freebsd

Of course names starting with "str" are reserved, so this kind of
problem is probably to be expected -- but you might try compiling with
"-std=c11 -pedantic" or something similar. (I'm assuming gcc because
your error message looks gcc-ish.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
jacobnavia
2017-04-26 09:36:11 UTC
Reply
Permalink
Raw Message
Post by jacobnavia
Hi Bart
Got your compiler, looks impressive. Tried to compile it and got a
problem with
bartcc.c:1236:15: error: conflicting types for 'strmode'
char * strmode (int32,int32);
^
/usr/include/string.h:164:7: note: previous declaration is here
void strmode(int, char *);
What is that strmode?
I did not know that function.
I compiled successfully the software of bart with just a change from
strmode to Strmode.
Loading...