Discussion:
More C syntax checker
Add Reply
David Kleinecke
2017-07-17 02:36:04 UTC
Reply
Permalink
Raw Message
Just for laughs I am posting my state machine for C
assignment expressions. This machine implements the
entire section 6.3 syntax in the 1989 C Standard.
The machine as presented is a syntax checker but I
have added comments about what is needed to make it
a compiler. There is a great deal more I could say
but someone familiar with C compilers should be able
to follow this approach. I will welcome comments:

assignment-expression = // reset rvalue
0 OPEN A
# 7
A type-name CLOSE 0 // set rvalue - push cast prefix operator
expression CLOSE 5 // primitive
4 OPEN B
# 7
B type-name CLOSE 4 // push cast prefix operator
expression CLOSE 5 // primitive
6 OPEN C
# 7
C type-name CLOSE 1 // object (size of type)
expression CLOSE 5 // primitive
3 OPEN 9
# 7
9 expression CLOSE 5 // primitive
7 INCREMENT 3 // push prefix operator
DECREMENT 3 // push prefix operator
SIZEOF 6
UnaryOperator 4 // push prefix operator
# 8
8 Identifier 5 // primitive
Constant 5 // primitive
StringLiteral 5 // primitive
5 START expression STOP 5
OPEN D
PERIOD Identifier 5
ARROW Identifier 5
INCREMENT 5
DECREMENT 5
# 1 // object (expanded primitive)
D assignment-expression E // prepare one argument
CLOSE 5
E COMMA D
CLOSE 5
1 AssignmentOperator 2 // ERROR if rvalue
QUERY expression COLON 2 // set rvalue
BinaryOperator 2 // set rvalue
# 1 // clear
2 # 0 // apply operator
Ben Bacarisse
2017-07-17 10:45:59 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Just for laughs I am posting my state machine for C
assignment expressions. This machine implements the
entire section 6.3 syntax in the 1989 C Standard.
Why such an old standard?
Post by David Kleinecke
The machine as presented is a syntax checker but I
have added comments about what is needed to make it
a compiler. There is a great deal more I could say
but someone familiar with C compilers should be able
to follow this approach.
Not without a lot of guess-work about the notation. If you want anyone
to follow it, you should give a brief description.
Post by David Kleinecke
assignment-expression = // reset rvalue
0 OPEN A
# 7
A type-name CLOSE 0 // set rvalue - push cast prefix operator
expression CLOSE 5 // primitive
4 OPEN B
# 7
B type-name CLOSE 4 // push cast prefix operator
expression CLOSE 5 // primitive
6 OPEN C
# 7
C type-name CLOSE 1 // object (size of type)
expression CLOSE 5 // primitive
3 OPEN 9
# 7
9 expression CLOSE 5 // primitive
7 INCREMENT 3 // push prefix operator
DECREMENT 3 // push prefix operator
SIZEOF 6
UnaryOperator 4 // push prefix operator
# 8
8 Identifier 5 // primitive
Constant 5 // primitive
StringLiteral 5 // primitive
5 START expression STOP 5
OPEN D
PERIOD Identifier 5
ARROW Identifier 5
INCREMENT 5
DECREMENT 5
# 1 // object (expanded primitive)
D assignment-expression E // prepare one argument
CLOSE 5
E COMMA D
CLOSE 5
1 AssignmentOperator 2 // ERROR if rvalue
QUERY expression COLON 2 // set rvalue
BinaryOperator 2 // set rvalue
# 1 // clear
2 # 0 // apply operator
Parsing tools are easy to use and generally produce reliable and
maintainable results. And established methods for writing parsers by
hand have a solid theoretical basis behind them. What's the benefit of
whatever your method is? (I can't discern the method from this example.)
--
Ben.
David Kleinecke
2017-07-17 20:46:13 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by David Kleinecke
Just for laughs I am posting my state machine for C
assignment expressions. This machine implements the
entire section 6.3 syntax in the 1989 C Standard.
Why such an old standard?
Post by David Kleinecke
The machine as presented is a syntax checker but I
have added comments about what is needed to make it
a compiler. There is a great deal more I could say
but someone familiar with C compilers should be able
to follow this approach.
Not without a lot of guess-work about the notation. If you want anyone
to follow it, you should give a brief description.
Post by David Kleinecke
assignment-expression = // reset rvalue
0 OPEN A
# 7
A type-name CLOSE 0 // set rvalue - push cast prefix operator
expression CLOSE 5 // primitive
4 OPEN B
# 7
B type-name CLOSE 4 // push cast prefix operator
expression CLOSE 5 // primitive
6 OPEN C
# 7
C type-name CLOSE 1 // object (size of type)
expression CLOSE 5 // primitive
3 OPEN 9
# 7
9 expression CLOSE 5 // primitive
7 INCREMENT 3 // push prefix operator
DECREMENT 3 // push prefix operator
SIZEOF 6
UnaryOperator 4 // push prefix operator
# 8
8 Identifier 5 // primitive
Constant 5 // primitive
StringLiteral 5 // primitive
5 START expression STOP 5
OPEN D
PERIOD Identifier 5
ARROW Identifier 5
INCREMENT 5
DECREMENT 5
# 1 // object (expanded primitive)
D assignment-expression E // prepare one argument
CLOSE 5
E COMMA D
CLOSE 5
1 AssignmentOperator 2 // ERROR if rvalue
QUERY expression COLON 2 // set rvalue
BinaryOperator 2 // set rvalue
# 1 // clear
2 # 0 // apply operator
Parsing tools are easy to use and generally produce reliable and
maintainable results. And established methods for writing parsers by
hand have a solid theoretical basis behind them. What's the benefit of
whatever your method is? (I can't discern the method from this example.)
My motive is to get closer to the iron. The same motive
that leads people to code in assembly. It also leads IMO
to a clearer understanding of the dividing line between
syntax and constraints/semantics. I could modify the machine
in many ways to avoid making unnecessary semantic rulings -
like noticing that
666.foo
is an error.

The machine works like this: There is a set of states -
these are indicated by the numbers in the first column.
The state numbers are purely labels and have no numeric
interpretation. In each state there are several tests
that are made in order. Each test either involves
identifying the current token or calling a function
(which is another state machine and returns either true
or false). If the token is recognized it is used and
the next token brought in. The number after the test
is the next state. I abbreviated series of successive
tests without any alternatives. The # "test" is the
default. States without a # end in an error. The +
means the machine function returns true - all other
returns are false. Token identifiers in all caps are
unique tokens. Tests of things in camel code are tests
for one of a set of tokens.
Malcolm McLean
2017-07-17 23:06:46 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Just for laughs I am posting my state machine for C
assignment expressions. This machine implements the
entire section 6.3 syntax in the 1989 C Standard.
Why such an old standard?
Post by David Kleinecke
The machine as presented is a syntax checker but I
have added comments about what is needed to make it
a compiler. There is a great deal more I could say
but someone familiar with C compilers should be able
to follow this approach.
Not without a lot of guess-work about the notation. If you want anyone
to follow it, you should give a brief description.
Post by David Kleinecke
assignment-expression = // reset rvalue
0 OPEN A
# 7
A type-name CLOSE 0 // set rvalue - push cast prefix operator
expression CLOSE 5 // primitive
4 OPEN B
# 7
B type-name CLOSE 4 // push cast prefix operator
expression CLOSE 5 // primitive
6 OPEN C
# 7
C type-name CLOSE 1 // object (size of type)
expression CLOSE 5 // primitive
3 OPEN 9
# 7
9 expression CLOSE 5 // primitive
7 INCREMENT 3 // push prefix operator
DECREMENT 3 // push prefix operator
SIZEOF 6
UnaryOperator 4 // push prefix operator
# 8
8 Identifier 5 // primitive
Constant 5 // primitive
StringLiteral 5 // primitive
5 START expression STOP 5
OPEN D
PERIOD Identifier 5
ARROW Identifier 5
INCREMENT 5
DECREMENT 5
# 1 // object (expanded primitive)
D assignment-expression E // prepare one argument
CLOSE 5
E COMMA D
CLOSE 5
1 AssignmentOperator 2 // ERROR if rvalue
QUERY expression COLON 2 // set rvalue
BinaryOperator 2 // set rvalue
# 1 // clear
2 # 0 // apply operator
Parsing tools are easy to use and generally produce reliable and
maintainable results. And established methods for writing parsers by
hand have a solid theoretical basis behind them. What's the benefit of
whatever your method is? (I can't discern the method from this example.)
My motive is to get closer to the iron. The same motive
that leads people to code in assembly. It also leads IMO
to a clearer understanding of the dividing line between
syntax and constraints/semantics. I could modify the machine
in many ways to avoid making unnecessary semantic rulings -
like noticing that
666.foo
is an error.
The machine works like this: There is a set of states -
these are indicated by the numbers in the first column.
The state numbers are purely labels and have no numeric
interpretation. In each state there are several tests
that are made in order. Each test either involves
identifying the current token or calling a function
(which is another state machine and returns either true
or false). If the token is recognized it is used and
the next token brought in. The number after the test
is the next state. I abbreviated series of successive
tests without any alternatives. The # "test" is the
default. States without a # end in an error. The +
means the machine function returns true - all other
returns are false. Token identifiers in all caps are
unique tokens. Tests of things in camel code are tests
for one of a set of tokens.
A compiler written in the conventional manner works with a
lexer and a parser. The lexer separates the input out into
tokens; keywords, operators, identifiers, literal numbers
and strings. The parser combines them into meaningful lines,
such as loops, assignments, function calls and so on.

So an assignment would be lexed as

identifier, equals, numeric literal

then the parser would tag it as

lvalue assign to rvalue

so it's a two stage process.

Now you could do what you have done and fold the lexer into the parser.
Essentially the tokens are single characters. That would work. But
it makes the parser very complicated and non-humanly intuitive.
Generally when we're reading C source we tokenise it, we think of
identifiers as discrete units separated by whitespace and operators
from other syntactical units. The lexer matches the natural reading
of basic units of meaning.
David Kleinecke
2017-07-18 04:09:51 UTC
Reply
Permalink
Raw Message
Post by Malcolm McLean
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Just for laughs I am posting my state machine for C
assignment expressions. This machine implements the
entire section 6.3 syntax in the 1989 C Standard.
Why such an old standard?
Post by David Kleinecke
The machine as presented is a syntax checker but I
have added comments about what is needed to make it
a compiler. There is a great deal more I could say
but someone familiar with C compilers should be able
to follow this approach.
Not without a lot of guess-work about the notation. If you want anyone
to follow it, you should give a brief description.
Post by David Kleinecke
assignment-expression = // reset rvalue
0 OPEN A
# 7
A type-name CLOSE 0 // set rvalue - push cast prefix operator
expression CLOSE 5 // primitive
4 OPEN B
# 7
B type-name CLOSE 4 // push cast prefix operator
expression CLOSE 5 // primitive
6 OPEN C
# 7
C type-name CLOSE 1 // object (size of type)
expression CLOSE 5 // primitive
3 OPEN 9
# 7
9 expression CLOSE 5 // primitive
7 INCREMENT 3 // push prefix operator
DECREMENT 3 // push prefix operator
SIZEOF 6
UnaryOperator 4 // push prefix operator
# 8
8 Identifier 5 // primitive
Constant 5 // primitive
StringLiteral 5 // primitive
5 START expression STOP 5
OPEN D
PERIOD Identifier 5
ARROW Identifier 5
INCREMENT 5
DECREMENT 5
# 1 // object (expanded primitive)
D assignment-expression E // prepare one argument
CLOSE 5
E COMMA D
CLOSE 5
1 AssignmentOperator 2 // ERROR if rvalue
QUERY expression COLON 2 // set rvalue
BinaryOperator 2 // set rvalue
# 1 // clear
2 # 0 // apply operator
Parsing tools are easy to use and generally produce reliable and
maintainable results. And established methods for writing parsers by
hand have a solid theoretical basis behind them. What's the benefit of
whatever your method is? (I can't discern the method from this example.)
My motive is to get closer to the iron. The same motive
that leads people to code in assembly. It also leads IMO
to a clearer understanding of the dividing line between
syntax and constraints/semantics. I could modify the machine
in many ways to avoid making unnecessary semantic rulings -
like noticing that
666.foo
is an error.
The machine works like this: There is a set of states -
these are indicated by the numbers in the first column.
The state numbers are purely labels and have no numeric
interpretation. In each state there are several tests
that are made in order. Each test either involves
identifying the current token or calling a function
(which is another state machine and returns either true
or false). If the token is recognized it is used and
the next token brought in. The number after the test
is the next state. I abbreviated series of successive
tests without any alternatives. The # "test" is the
default. States without a # end in an error. The +
means the machine function returns true - all other
returns are false. Token identifiers in all caps are
unique tokens. Tests of things in camel code are tests
for one of a set of tokens.
A compiler written in the conventional manner works with a
lexer and a parser. The lexer separates the input out into
tokens; keywords, operators, identifiers, literal numbers
and strings. The parser combines them into meaningful lines,
such as loops, assignments, function calls and so on.
So an assignment would be lexed as
identifier, equals, numeric literal
then the parser would tag it as
lvalue assign to rvalue
so it's a two stage process.
Now you could do what you have done and fold the lexer into the parser.
Essentially the tokens are single characters. That would work. But
it makes the parser very complicated and non-humanly intuitive.
Generally when we're reading C source we tokenise it, we think of
identifiers as discrete units separated by whitespace and operators
from other syntactical units. The lexer matches the natural reading
of basic units of meaning.
In a C context the "lexer" is an aspect of the preprocessor.
In all modern compilers there is also a post-processor that
changes the parser output into actual machine code.
Keith Thompson
2017-07-18 16:06:46 UTC
Reply
Permalink
Raw Message
David Kleinecke <***@gmail.com> writes:
[...]
Post by David Kleinecke
In a C context the "lexer" is an aspect of the preprocessor.
A preprocessor needs to split its input into preprocessor tokens, but in
a compiler where the preprocessor is implemented as a separate pass, the
main body of the compiler also has to include a lexer. The way
information is passed from the preprocessor to the rest of the compiler
is not specified, but it's commonly done as source code.
Post by David Kleinecke
In all modern compilers there is also a post-processor that
changes the parser output into actual machine code.
I'm not sure what you mean by "post-processor" here. There isn't
necessarily an explicit representation of the parser output. It depends
on the internal design of the compiler.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Kleinecke
2017-07-18 16:57:11 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by David Kleinecke
In a C context the "lexer" is an aspect of the preprocessor.
A preprocessor needs to split its input into preprocessor tokens, but in
a compiler where the preprocessor is implemented as a separate pass, the
main body of the compiler also has to include a lexer. The way
information is passed from the preprocessor to the rest of the compiler
is not specified, but it's commonly done as source code.
Sorry, I don't believe that. There is no reason at all not to
continue the tokens from pre-processor to parser (after making
the obvious changes). If a copy of the pre-processor output in
source code form is required it is easily reconstructed from
the tokens.
Post by Keith Thompson
Post by David Kleinecke
In all modern compilers there is also a post-processor that
changes the parser output into actual machine code.
I'm not sure what you mean by "post-processor" here. There isn't
necessarily an explicit representation of the parser output. It depends
on the internal design of the compiler.
The parser (almost?) always generates an intermediate form in
a "language" of its own. There must be code (which I call the
post-processor) to change that intermediate form to actual
machine code.

Ritchie's original C compiler did parse directly to assembly
language - but I suspect no later compiler ever did.
Scott Lurndal
2017-07-18 17:19:03 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Keith Thompson
I'm not sure what you mean by "post-processor" here. There isn't
necessarily an explicit representation of the parser output. It depends
on the internal design of the compiler.
The parser (almost?) always generates an intermediate form in
a "language" of its own. There must be code (which I call the
post-processor) to change that intermediate form to actual
machine code.
Ritchie's original C compiler did parse directly to assembly
language - but I suspect no later compiler ever did.
The V6 compiler I'm currently looking at first generates
an expression tree, then generates code from that tree.

c10.c:

/*
* Try to compile the tree with the code table using
* registers areg and up. If successful,
* return the register where the value actually ended up.
* If unsuccessful, return -1.
*
* Most of the work is the macro-expansion of the
* code table.
*/
cexpr(atree, table, areg)
struct tnode *atree;
struct table *table;
{
int c, r;
register struct tnode *p, *p1, *tree;
struct table *ctable;
struct tnode *p2;
char *string;
int reg, reg1, rreg, flag, opd;
char *opt;

tree = atree;
reg = areg;
p1 = tree->tr2;
c = tree->op;
opd = opdope[c];
/*
* When the value of a relational or a logical expression is
* desired, more work must be done.
*/
if ((opd&RELAT||c==LOGAND||c==LOGOR||c==EXCLA) && table!=cctab) {
cbranch(tree, c=isn++, 1, reg);
rcexpr(&czero, table, reg);
branch(isn, 0);
label(c);
rcexpr(&cone, table, reg);
label(isn++);
return(reg);
}
if(c==QUEST) {
if (table==cctab)
return(-1);
cbranch(tree->tr1, c=isn++, 0, reg);
flag = nstack;
rreg = rcexpr(p1->tr1, table, reg);
nstack = flag;
branch(r=isn++, 0);
label(c);
reg = rcexpr(p1->tr2, table, rreg);
if (rreg!=reg)
printf("mov%c r%d,r%d\n",
isfloat(tree),reg,rreg);
label(r);
return(rreg);
}
...
Keith Thompson
2017-07-18 20:40:35 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Keith Thompson
[...]
Post by David Kleinecke
In a C context the "lexer" is an aspect of the preprocessor.
A preprocessor needs to split its input into preprocessor tokens, but in
a compiler where the preprocessor is implemented as a separate pass, the
main body of the compiler also has to include a lexer. The way
information is passed from the preprocessor to the rest of the compiler
is not specified, but it's commonly done as source code.
Sorry, I don't believe that. There is no reason at all not to
continue the tokens from pre-processor to parser (after making
the obvious changes). If a copy of the pre-processor output in
source code form is required it is easily reconstructed from
the tokens.
I can think of at least one good reason. Text is a well defined format,
and is easy to work with. Passing a non-text token stream from the
preprocessor to the rest of the compiler would require defining some
format that both would have to be aware of. Tokenizing is defined
differently for the preprocessor (preprocessor tokens) and for the rest
of the compiler (tokens).

Certainly early C compilers had a preprocessor that generated C source
code that was then compiled. (Very early C compilers probably didn't
even have a preprocessor.) The standard doesn't require preprocessor
output to be in text form, but it's designed to permit it.

gcc, at least, has an option to dump the preprocessor output as text.

I don't know whether gcc (or any other compiler) has its preprocessor
generated text in normal operation, but it certainly could. And lexing
is a cheap enough operation that it wouldn't significantly hurt
performance.
Post by David Kleinecke
Post by Keith Thompson
Post by David Kleinecke
In all modern compilers there is also a post-processor that
changes the parser output into actual machine code.
I'm not sure what you mean by "post-processor" here. There isn't
necessarily an explicit representation of the parser output. It depends
on the internal design of the compiler.
The parser (almost?) always generates an intermediate form in
a "language" of its own. There must be code (which I call the
post-processor) to change that intermediate form to actual
machine code.
The parser can create a data structure dynamically in memory. It
needn't keep information about the entire translation unit; for example,
it might discard information about each function after it finishes
processing it. Parsing and code generation can be integrated, so that
each grammatical production triggers some semantic action.

A parser *can* create an abstract syntax tree, and that tree may or may
not be in a form that can be written to a file. But that's not the only
approach.
Post by David Kleinecke
Ritchie's original C compiler did parse directly to assembly
language - but I suspect no later compiler ever did.
There are a few people here who work on their own C compilers. Perhaps
they can comment on what internal representations they use.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
j***@verizon.net
2017-07-18 23:01:18 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Keith Thompson
[...]
Post by David Kleinecke
In a C context the "lexer" is an aspect of the preprocessor.
A preprocessor needs to split its input into preprocessor tokens, but in
a compiler where the preprocessor is implemented as a separate pass, the
main body of the compiler also has to include a lexer. The way
information is passed from the preprocessor to the rest of the compiler
is not specified, but it's commonly done as source code.
Sorry, I don't believe that. There is no reason at all not to
continue the tokens from pre-processor to parser (after making
the obvious changes). If a copy of the pre-processor output in
source code form is required it is easily reconstructed from
the tokens.
I'm not sure precisely what you mean by that. Consider the following source code:

function(0x3.EFp-6, 'c', "string")

Could you put together an example of what the output from the preprocessor for that line of of code could look like? Most of the compilers I've used over the past several decades have had a -E option which showed the results of the pre-processing phase, and they all would have produced output from that line which looked exactly like the input (assuming that 'function' is not the name of a macro), except possibly for the handling of white space that is not part of a character constant or string literal. The contents of that line were carefully chosen to avoid several different complications (such as trigraphs, digraphs, members of the extended source character set, escaped newlines, comments, UCNs, etc.), so there's no need to mention the complications that were successfully avoided, in your reply.
David Kleinecke
2017-07-22 01:13:48 UTC
Reply
Permalink
Raw Message
Post by j***@verizon.net
Post by David Kleinecke
Post by Keith Thompson
[...]
Post by David Kleinecke
In a C context the "lexer" is an aspect of the preprocessor.
A preprocessor needs to split its input into preprocessor tokens, but in
a compiler where the preprocessor is implemented as a separate pass, the
main body of the compiler also has to include a lexer. The way
information is passed from the preprocessor to the rest of the compiler
is not specified, but it's commonly done as source code.
Sorry, I don't believe that. There is no reason at all not to
continue the tokens from pre-processor to parser (after making
the obvious changes). If a copy of the pre-processor output in
source code form is required it is easily reconstructed from
the tokens.
function(0x3.EFp-6, 'c', "string")
Could you put together an example of what the output from the preprocessor for that line of of code could look like? Most of the compilers I've used over the past several decades have had a -E option which showed the results of the pre-processing phase, and they all would have produced output from that line which looked exactly like the input (assuming that 'function' is not the name of a macro), except possibly for the handling of white space that is not part of a character constant or string literal. The contents of that line were carefully chosen to avoid several different complications (such as trigraphs, digraphs, members of the extended source character set, escaped newlines, comments, UCNs, etc.), so there's no need to mention the complications that were successfully avoided, in your reply.
function(0x3.EFp-6, 'c', "string")
Tokens generate in order (whether indices or pointers):

function
(
3
.
EFp
-
6
,
63
,
a pointer to "string"
)

so?
Keith Thompson
2017-07-22 02:40:57 UTC
Reply
Permalink
Raw Message
[...]
Post by David Kleinecke
Post by j***@verizon.net
function(0x3.EFp-6, 'c', "string")
Could you put together an example of what the output from the
preprocessor for that line of of code could look like? Most of the
compilers I've used over the past several decades have had a -E
option which showed the results of the pre-processing phase, and they
all would have produced output from that line which looked exactly
like the input (assuming that 'function' is not the name of a macro),
except possibly for the handling of white space that is not part of a
character constant or string literal. The contents of that line were
carefully chosen to avoid several different complications (such as
trigraphs, digraphs, members of the extended source character set,
escaped newlines, comments, UCNs, etc.), so there's no need to
mention the complications that were successfully avoided, in your
reply.
function(0x3.EFp-6, 'c', "string")
function
(
3
.
EFp
-
6
,
63
,
a pointer to "string"
)
so?
Let me guess, you're ignoring everything after C90.

0x3.EFp-6 is a single token, a hexadecimal floating-point constant.
(Its value is 0.06146240234375 .)

Your tokenizer converts the constant 0x3 to 3, and 'c' to 63? That's
going to break token-pasting. You should leave them in their original
form and let later phases worry about evaluating them.

In what sense is
a pointer to "string"
a token? How is it represented? How are these tokens represented
in general? I'm guessing that "a pointer to "string"" is some kind
of pseudo-code, but I don't know how to interpret it. (I'm concerned
about breaking `sizeof "string"`, but I don't understand what you're
saying well enough to know whether that's an issue.)
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Kleinecke
2017-07-22 03:21:24 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by David Kleinecke
Post by j***@verizon.net
function(0x3.EFp-6, 'c', "string")
Could you put together an example of what the output from the
preprocessor for that line of of code could look like? Most of the
compilers I've used over the past several decades have had a -E
option which showed the results of the pre-processing phase, and they
all would have produced output from that line which looked exactly
like the input (assuming that 'function' is not the name of a macro),
except possibly for the handling of white space that is not part of a
character constant or string literal. The contents of that line were
carefully chosen to avoid several different complications (such as
trigraphs, digraphs, members of the extended source character set,
escaped newlines, comments, UCNs, etc.), so there's no need to
mention the complications that were successfully avoided, in your
reply.
function(0x3.EFp-6, 'c', "string")
function
(
3
.
EFp
-
6
,
63
,
a pointer to "string"
)
so?
Let me guess, you're ignoring everything after C90.
0x3.EFp-6 is a single token, a hexadecimal floating-point constant.
(Its value is 0.06146240234375 .)
Your tokenizer converts the constant 0x3 to 3, and 'c' to 63? That's
going to break token-pasting. You should leave them in their original
form and let later phases worry about evaluating them.
In what sense is
a pointer to "string"
a token? How is it represented? How are these tokens represented
in general? I'm guessing that "a pointer to "string"" is some kind
of pseudo-code, but I don't know how to interpret it. (I'm concerned
about breaking `sizeof "string"`, but I don't understand what you're
saying well enough to know whether that's an issue.)
Each token has characteristics attached (often empty).
A constant token points at a value. A string token points
at a string. A character constant can't be meaningfully
pasted so I make them numerics.
Keith Thompson
2017-07-22 03:33:35 UTC
Reply
Permalink
Raw Message
[...]
Post by David Kleinecke
Post by Keith Thompson
Post by David Kleinecke
Post by j***@verizon.net
function(0x3.EFp-6, 'c', "string")
function
(
3
.
EFp
-
6
,
63
,
a pointer to "string"
)
so?
Let me guess, you're ignoring everything after C90.
0x3.EFp-6 is a single token, a hexadecimal floating-point constant.
(Its value is 0.06146240234375 .)
Your tokenizer converts the constant 0x3 to 3, and 'c' to 63? That's
going to break token-pasting. You should leave them in their original
form and let later phases worry about evaluating them.
In what sense is
a pointer to "string"
a token? How is it represented? How are these tokens represented
in general? I'm guessing that "a pointer to "string"" is some kind
of pseudo-code, but I don't know how to interpret it. (I'm concerned
about breaking `sizeof "string"`, but I don't understand what you're
saying well enough to know whether that's an issue.)
Each token has characteristics attached (often empty).
A constant token points at a value. A string token points
at a string. A character constant can't be meaningfully
pasted so I make them numerics.
Character and numeric constants can be stringized. If you throw away
the distinction between 0x3 and 3, or between 'c' and 0x63, that's going
to break.

I still have no idea how your tokens are represented. Is it just an
in-memory data structure?
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
David Kleinecke
2017-07-22 16:36:10 UTC
Reply
Permalink
Raw Message
Post by Keith Thompson
[...]
Post by David Kleinecke
Post by Keith Thompson
Post by David Kleinecke
Post by j***@verizon.net
function(0x3.EFp-6, 'c', "string")
function
(
3
.
EFp
-
6
,
63
,
a pointer to "string"
)
so?
Let me guess, you're ignoring everything after C90.
0x3.EFp-6 is a single token, a hexadecimal floating-point constant.
(Its value is 0.06146240234375 .)
Your tokenizer converts the constant 0x3 to 3, and 'c' to 63? That's
going to break token-pasting. You should leave them in their original
form and let later phases worry about evaluating them.
In what sense is
a pointer to "string"
a token? How is it represented? How are these tokens represented
in general? I'm guessing that "a pointer to "string"" is some kind
of pseudo-code, but I don't know how to interpret it. (I'm concerned
about breaking `sizeof "string"`, but I don't understand what you're
saying well enough to know whether that's an issue.)
Each token has characteristics attached (often empty).
A constant token points at a value. A string token points
at a string. A character constant can't be meaningfully
pasted so I make them numerics.
Character and numeric constants can be stringized. If you throw away
the distinction between 0x3 and 3, or between 'c' and 0x63, that's going
to break.
I still have no idea how your tokens are represented. Is it just an
in-memory data structure?
I usually visualize a token as an index into a data array
but I might use pointers to data structs instead. A design
detail. The data for each token includes its original
representation. This is necessary for error reports if no
other reason. I am aware that "original representation"
might not be what is written in the source code but the
coder must live with that.
Keith Thompson
2017-07-22 23:04:25 UTC
Reply
Permalink
Raw Message
[...]
Post by David Kleinecke
Post by Keith Thompson
I still have no idea how your tokens are represented. Is it just an
in-memory data structure?
I usually visualize a token as an index into a data array
but I might use pointers to data structs instead. A design
detail. The data for each token includes its original
representation. This is necessary for error reports if no
other reason. I am aware that "original representation"
might not be what is written in the source code but the
coder must live with that.
So is it an in-memory data structure?

I was asking what the output of your preprocessor looks like.

Some preprocessors produce output as plain text that can be stored
in a file, where token boundaries are not represented explicitly.
The fact that you're talking about pointers and arrays makes me
think yours exists only in memory while the preprocessor is running,
where tokens are represented explicitly as tokens and later phases
of the compiler don't have to re-tokenize anything.

I just wanted to clarify that point.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
j***@verizon.net
2017-07-22 18:50:38 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by j***@verizon.net
Post by David Kleinecke
Post by Keith Thompson
[...]
Post by David Kleinecke
In a C context the "lexer" is an aspect of the preprocessor.
A preprocessor needs to split its input into preprocessor tokens, but in
a compiler where the preprocessor is implemented as a separate pass, the
main body of the compiler also has to include a lexer. The way
information is passed from the preprocessor to the rest of the compiler
is not specified, but it's commonly done as source code.
Sorry, I don't believe that. There is no reason at all not to
continue the tokens from pre-processor to parser (after making
the obvious changes). If a copy of the pre-processor output in
source code form is required it is easily reconstructed from
the tokens.
function(0x3.EFp-6, 'c', "string")
Could you put together an example of what the output from the preprocessor for that line of of code could look like? Most of the compilers I've used over the past several decades have had a -E option which showed the results of the pre-processing phase, and they all would have produced output from that line which looked exactly like the input (assuming that 'function' is not the name of a macro), except possibly for the handling of white space that is not part of a character constant or string literal. The contents of that line were carefully chosen to avoid several different complications (such as trigraphs, digraphs, members of the extended source character set, escaped newlines, comments, UCNs, etc.), so there's no need to mention the complications that were successfully avoided, in your reply.
function(0x3.EFp-6, 'c', "string")
function
(
3
.
EFp
-
6
,
63
,
a pointer to "string"
)
I was assuming you were talking about a compiler operating in a mode that conforms, at least roughly, to the C standard. The output from the pre-processing phase should therefore identify the following pre-processing tokens:

identifier: "function"
punctuator: '('
pp-number: "0x3.EFp-6"
punctuator: ','
character-constant: 'c'
punctuator: ','
string literal: "string"
punctuator: ')'

Your compiler thoroughly messed up it's identification of the pp-number. It's supposed to parse as a single pre-processing token during translation phase 3, and then be converted to a single token during translation phase 7. Since you've incorrectly parsed it as 5 (pre-processing?) tokens, it's not clear to me whether your compiler will be able to reassemble them into a single hexadecimal floating point constant during translation phase 7, as required by the C standard, particularly since it arbitrarily dropped the '0x' prefix. If I had written 3.EFp-6, would your preprocessor have parsed it the same way? If so, how do you distinguish between them in downstream processing? One is a perfectly valid constant, the other doesn't qualify as any valid kind of token.
Post by David Kleinecke
so?
Well, if you hadn't mangled the handling of the pp-number so badly, my point would have been that the pre-processing tokens (correctly) parsed out of the above source code must each, in order to be handled properly during downstream processing, contain information about the sequence of characters that they were parsed from. There's many options for how that information could be stored, but a simple character constant or string containing equivalent C source code would seem the simplest way to store the information that needs to be stored for each of those token types.
Because of all of the processing that occurs between phase 1 and phase 4, those strings might not correspond to actual pieces of the input source code, but they would, normally, look like source code that could have been input, and would have had the same meaning. And that's all that Keith was saying when he mentioned that the information is commonly transmitted as source code.

What I was looking for in your response (though I correctly anticipated that you would not provide it) was some indication of how the information that those tokens needed to store, could be stored in a form that wasn't closely equivalent to C source code, thereby justifying your object to Keith's statement. Except for your mishandling of the pp-number, the exact sequence of characters from the source code that makes up each pre-processing token survives intact as part of your pre-processing output.
Ben Bacarisse
2017-07-22 19:18:49 UTC
Reply
Permalink
Raw Message
<snip>
Post by j***@verizon.net
Post by David Kleinecke
Post by j***@verizon.net
function(0x3.EFp-6, 'c', "string")
function
(
3
.
EFp
-
6
,
63
,
a pointer to "string"
)
I was assuming you were talking about a compiler operating in a mode
that conforms, at least roughly, to the C standard.
He's working to C90. I once asked why but I didn't get an answer.
Post by j***@verizon.net
The output from the pre-processing phase should therefore identify the
identifier: "function"
punctuator: '('
pp-number: "0x3.EFp-6"
so in C90 it would be:

pp-number: 0x3.EFp
punctuator: -
pp-number: 6
Post by j***@verizon.net
punctuator: ','
character-constant: 'c'
punctuator: ','
string literal: "string"
punctuator: ')'
<snip>
--
Ben.
David Kleinecke
2017-07-23 03:02:33 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
He's working to C90. I once asked why but I didn't get an answer.
I thought I have made it clear that I always use C89.
I generally deplore most of the later "improvements"
and it is easier to stay in C89 than use a later C
and ignore most of the new features.
Ben Bacarisse
2017-07-23 12:10:01 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Ben Bacarisse
He's working to C90. I once asked why but I didn't get an answer.
I thought I have made it clear that I always use C89.
I generally deplore most of the later "improvements"
and it is easier to stay in C89 than use a later C
and ignore most of the new features.
That can only be part of the reason unless you are doing whatever work
you are doing just for yourself. If were to write a C parser for some
public purpose I would discount my own feelings about which C standard
to consider.
--
Ben.
s***@casperkitty.com
2017-07-22 22:27:45 UTC
Reply
Permalink
Raw Message
Post by j***@verizon.net
Your compiler thoroughly messed up it's identification of the pp-number. It's supposed to parse as a single pre-processing token during translation phase 3, and then be converted to a single token during translation phase 7.
One would have to work very hard to write a strictly-conforming program that
would be able to tell whether PP-number was kept as a single token, or
whether other aspects of compilation were willing to process the sequence of
tokens in suitable fashion. If a text scanner breaks 0.123E+5 into three
portions:

Numeric literal-with-E-suffix "0.123" (NLWE)
Plus
Integer literal "5" (IL)

the grammar could then accommodate

NLWE + IL
NLWE - IL
NLWE IL

as valid constructions which yield floating-point literals. If the token
operator simply treats the things on either side as pieces of text without
whitespace between them, then joining 0.123E and +5 would work just fine.

While there might conceivably be some non-contrived programs that would be
broken by a compiler whose tokenizer ignored the definition of pp-numbers
and instead regarded 0x1.23P+5 as starting with a token that means "0x1.23
raised to the following hex power", I suspect they are far more rare than
situations where the definition of PP-number inappropriately breaks what
code whose clear and obvious meaning would have been understood without
difficulty by compilers before the authors of the Standard decided that
would be "too difficult".
David Kleinecke
2017-07-23 02:49:46 UTC
Reply
Permalink
Raw Message
Post by j***@verizon.net
Post by David Kleinecke
Post by j***@verizon.net
Post by David Kleinecke
Post by Keith Thompson
[...]
Post by David Kleinecke
In a C context the "lexer" is an aspect of the preprocessor.
A preprocessor needs to split its input into preprocessor tokens, but in
a compiler where the preprocessor is implemented as a separate pass, the
main body of the compiler also has to include a lexer. The way
information is passed from the preprocessor to the rest of the compiler
is not specified, but it's commonly done as source code.
Sorry, I don't believe that. There is no reason at all not to
continue the tokens from pre-processor to parser (after making
the obvious changes). If a copy of the pre-processor output in
source code form is required it is easily reconstructed from
the tokens.
function(0x3.EFp-6, 'c', "string")
Could you put together an example of what the output from the preprocessor for that line of of code could look like? Most of the compilers I've used over the past several decades have had a -E option which showed the results of the pre-processing phase, and they all would have produced output from that line which looked exactly like the input (assuming that 'function' is not the name of a macro), except possibly for the handling of white space that is not part of a character constant or string literal. The contents of that line were carefully chosen to avoid several different complications (such as trigraphs, digraphs, members of the extended source character set, escaped newlines, comments, UCNs, etc.), so there's no need to mention the complications that were successfully avoided, in your reply.
function(0x3.EFp-6, 'c', "string")
function
(
3
.
EFp
-
6
,
63
,
a pointer to "string"
)
identifier: "function"
punctuator: '('
pp-number: "0x3.EFp-6"
punctuator: ','
character-constant: 'c'
punctuator: ','
string literal: "string"
punctuator: ')'
Your compiler thoroughly messed up it's identification of the pp-number.
According to my reading of the C89 standard "p-" cannot occur in
a single pp-number. So the pp-token sequence is really
0x3.EFp
-
6
The expansion of 0x3.EFp should, as you point out really happen
at step 7 and I should move it there. The actual expansion given
is my idea of what an error processor should do - looking hard
at it I think I'd better change that. Thank you for the useful
comment



It's supposed to parse as a single pre-processing token during translation phase 3, and then be converted to a single token during translation phase 7. Since you've incorrectly parsed it as 5 (pre-processing?) tokens, it's not clear to me whether your compiler will be able to reassemble them into a single hexadecimal floating point constant during translation phase 7, as required by the C standard, particularly since it arbitrarily dropped the '0x' prefix. If I had written 3.EFp-6, would your preprocessor have parsed it the same way? If so, how do you distinguish between them in downstream processing? One is a perfectly valid constant, the other doesn't qualify as any valid kind of token.
Post by j***@verizon.net
Post by David Kleinecke
so?
Well, if you hadn't mangled the handling of the pp-number so badly, my point would have been that the pre-processing tokens (correctly) parsed out of the above source code must each, in order to be handled properly during downstream processing, contain information about the sequence of characters that they were parsed from. There's many options for how that information could be stored, but a simple character constant or string containing equivalent C source code would seem the simplest way to store the information that needs to be stored for each of those token types.
Because of all of the processing that occurs between phase 1 and phase 4, those strings might not correspond to actual pieces of the input source code, but they would, normally, look like source code that could have been input, and would have had the same meaning. And that's all that Keith was saying when he mentioned that the information is commonly transmitted as source code.
What I was looking for in your response (though I correctly anticipated that you would not provide it) was some indication of how the information that those tokens needed to store, could be stored in a form that wasn't closely equivalent to C source code, thereby justifying your object to Keith's statement. Except for your mishandling of the pp-number, the exact sequence of characters from the source code that makes up each pre-processing token survives intact as part of your pre-processing output.
As I mentioned elsewhere part of the data associated with each
token is the character string underlying it. The step 4 diddling
really only affects identifiers but can affect numbers. What I
store with the token is the string that was tokenized after all
pasting and stringizing. String concatenation will also
change the apparent token original text
Keith Thompson
2017-07-23 03:56:48 UTC
Reply
Permalink
Raw Message
[...]
Post by David Kleinecke
Post by j***@verizon.net
Your compiler thoroughly messed up it's identification of the
pp-number.
According to my reading of the C89 standard "p-" cannot occur in
a single pp-number.
You're correct about C89. The current standard, C11, as well as
the previous standard, C99, do permit "p-" to occur in a pp-number.

When discussing your parser, I strongly suggest that you mention
explicitly that you're writing a C89 (or C90) parser. Don't expect
anyone to remember that you're deliberately ignoring the current
standard. If you say you're doing C, a lot of people are going to
assume, quite reasonably, that you mean C11.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
GOTHIER Nathan
2017-07-23 11:27:10 UTC
Reply
Permalink
Raw Message
On Sat, 22 Jul 2017 20:56:48 -0700
Post by Keith Thompson
When discussing your parser, I strongly suggest that you mention
explicitly that you're writing a C89 (or C90) parser. Don't expect
anyone to remember that you're deliberately ignoring the current
standard. If you say you're doing C, a lot of people are going to
assume, quite reasonably, that you mean C11.
If anybody is talking about C, I'm assuming it is at least conforming to K&R SE
(aka ISO/IEC 9899:1990). If the C committee broke the backward compatibility
with K&R SE, it means its members killed the C standard for their own interest.

Nevertheless I am only considering the de facto standard as the true standard.
s***@casperkitty.com
2017-07-23 13:56:19 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
If anybody is talking about C, I'm assuming it is at least conforming to K&R SE
(aka ISO/IEC 9899:1990). If the C committee broke the backward compatibility
with K&R SE, it means its members killed the C standard for their own interest.
Situations in which a strictly conforming C89 program would be broken by
C99's expansion of what "pp-number" means are pretty rare, and would have
been better handled in C89 by recognizing that certain forms may be scanned
in Unspecified fashion [so as not to require compilers to add code to break
production code that would have been unambiguously handled by many if not
most pre-Standard compilers].

Outside of situations involving the stringize operator and backslash,
there is only one situation where C89 would not work with a scanner
which operated by dividing every character into one of three groups
(whitespace, alphanumeric, and other), gobbled up groups of whitespace
and set a flag but otherwise ignored them, gobbled up groups of alpha-
numerics, and output other tokens singly but with a whitespace flag.
Operators like "++" would be handled by saying that if a + with or
without whitespace is followed by a + with no whitespace it should
behave as an increment operator. Support for the stringize operator
would require keeping for each token a copy of its source-code
representation, but the only other problematic case would be ensuring
that after something like "#define E 9" a value like 1.E7 doesn't get
turned into 1.97; that case could have been handled by simply saying
that any code which defines a macro for E or e must write E-format
values either without a decimal point, or with at least one digit
between it and the E.

Adding support for P-format numbers complicates things a little, and
would make it necessary for the tokenizer to regard "." as being part
of tokens that start with digits but not those that start with letters.
Still, that's not as bad as requiring special rules for the letters
"p", "P", "e", and "E", which aren't applicable to other letters.
j***@verizon.net
2017-07-24 15:31:46 UTC
Reply
Permalink
Raw Message
Post by GOTHIER Nathan
On Sat, 22 Jul 2017 20:56:48 -0700
Post by Keith Thompson
When discussing your parser, I strongly suggest that you mention
explicitly that you're writing a C89 (or C90) parser. Don't expect
anyone to remember that you're deliberately ignoring the current
standard. If you say you're doing C, a lot of people are going to
assume, quite reasonably, that you mean C11.
If anybody is talking about C, I'm assuming it is at least conforming to K&R
SE (aka ISO/IEC 9899:1990).
Those two are not the same. K&R SE was first published before some of the last few changes were made to C90, and does not reflect those changes.
Post by GOTHIER Nathan
... If the C committee broke the backward compatibility
with K&R SE,
The committee has a great deal of respect for the importance of backwards
compatibility - so much so that many people who want to add all kinds of
changes to the C standard think that it pays too much respect to that
principle. However, it's only one of several principles that the committee
considers important, and they did indeed break backwards compatibility in a
couple of cases with C99, because one of those other principles was considered
more important. Most notably, most of the functions added to <math.h> have
names that were in the name space reserved for users by C90.

However, hexadecimal constants are definitely not an example of this. The
syntax was carefully chosen to be a syntax error in C90. Backwards
compatibility is a goal that the C committee supports, but only for strictly
conforming code; it does NOT include requiring that new versions of C continue
to treat as syntax errors or constraint violations everything that previous
versions did.
Post by GOTHIER Nathan
... it means its members killed the C standard for their own
interest.
I'm curious - what self-interest do you think was involved? As far as I know,
both hexadecimal floating point and the new math functions I mentioned above
were added in C99 to support the needs of people who write code that does a lot
of number crunching. You might not be one of those people, in which case you
might not appreciate their needs, and might even resent changes made to the
language to support them. However, those people do exist, and what they lack in
numbers they make up for in the sheer number of floating point instructions
executed per second by their programs. Many of those programs run on
high-powered machines, and keep those machines busy almost full-time. Catering
to the needs of that committee was not a matter of committee self-interest.
Post by GOTHIER Nathan
Nevertheless I am only considering the de facto standard as the true standard.
I consider only the currently approved standard, which officially replaced the
older ones, to be the true standard. I admit that it's commonplace for people
to target older versions of the standard. However, something that's "de facto"
cannot, by its very nature, be a real standard, since "de facto", people write
code that doesn't conform to any official standard. The "de facto" standard is
definitely NOT C90 - it's some weird mixture of C90 and the most popular of the
extensions to C90 that are accepted by some widely used compilers. The "de
facto" standard is starting to include some C99isms.
s***@casperkitty.com
2017-07-24 18:25:12 UTC
Reply
Permalink
Raw Message
Post by j***@verizon.net
The committee has a great deal of respect for the importance of backwards
compatibility - so much so that many people who want to add all kinds of
changes to the C standard think that it pays too much respect to that
principle. However, it's only one of several principles that the committee
considers important, and they did indeed break backwards compatibility in a
couple of cases with C99, because one of those other principles was considered
more important. Most notably, most of the functions added to <math.h> have
names that were in the name space reserved for users by C90.
I wonder what "problem" there would have been with explicitly allowing
(and even recommending) that new functions use "linker" names which are
in the reserved space, and requiring that header files use the pattern:

#define newFunction __newFunction
float __newFunction(float);

with implementations being allowed to use any reserved names they saw fit,
or else the names of the new functions (the latter being allowed for
compatibility with code targeting any existing implementations that might
have already been providing such names).

Code which wants to use newFunction() would be required to include the proper
header, rather than being allowed to define the prototype itself, but the
existence of new functions would have no effect upon any existing code that
didn't include the new header, even if it exported symbols with the same
names as new functions. In addition, user code would be able to say:

#ifndef newFunction
#define newFunction(x,y) myAlternativeToNewFunction((x),(y))
#endif

and benefit from the new function when it exists, while also working just
fine (though perhaps less efficiently) on systems where it doesn't.
Post by j***@verizon.net
However, hexadecimal constants are definitely not an example of this. The
syntax was carefully chosen to be a syntax error in C90. Backwards
compatibility is a goal that the C committee supports, but only for strictly
conforming code; it does NOT include requiring that new versions of C continue
to treat as syntax errors or constraint violations everything that previous
versions did.
One could write strictly-conforming programs which behave differently on
C90 and C99 because of the changed definition of pp-number. Given, e.g.

#define QQ qq

a function that stringizes 0x1.0p+QQ after performing macro substitutions
would yield the string literal "0x1.0p+QQ" or "0x1.0p+qq" based upon whether
the preprocessor regarded "+QQ" as part of the preceding token. The
preprocessor need not know nor care whether the token is valid, and if it
doesn't survives preprocessing except in stringized form the later stages of
compilation wouldn't get to see it either.
Post by j***@verizon.net
I consider only the currently approved standard, which officially replaced the
older ones, to be the true standard. I admit that it's commonplace for people
to target older versions of the standard. However, something that's "de facto"
cannot, by its very nature, be a real standard, since "de facto", people write
code that doesn't conform to any official standard. The "de facto" standard is
definitely NOT C90 - it's some weird mixture of C90 and the most popular of the
extensions to C90 that are accepted by some widely used compilers. The "de
facto" standard is starting to include some C99isms.
A major purpose of official standards is to tell people how things should
behave *in cases where it would otherwise be unclear*. If everybody would
do something the same way even in the absence of a standard, nothing would
be gained by the presence of a standard unless or until someone had a reason
to want to do something else.
David Kleinecke
2017-07-24 18:45:23 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by j***@verizon.net
The committee has a great deal of respect for the importance of backwards
compatibility - so much so that many people who want to add all kinds of
changes to the C standard think that it pays too much respect to that
principle. However, it's only one of several principles that the committee
considers important, and they did indeed break backwards compatibility in a
couple of cases with C99, because one of those other principles was considered
more important. Most notably, most of the functions added to <math.h> have
names that were in the name space reserved for users by C90.
I wonder what "problem" there would have been with explicitly allowing
(and even recommending) that new functions use "linker" names which are
#define newFunction __newFunction
float __newFunction(float);
with implementations being allowed to use any reserved names they saw fit,
or else the names of the new functions (the latter being allowed for
compatibility with code targeting any existing implementations that might
have already been providing such names).
Code which wants to use newFunction() would be required to include the proper
header, rather than being allowed to define the prototype itself, but the
existence of new functions would have no effect upon any existing code that
didn't include the new header, even if it exported symbols with the same
#ifndef newFunction
#define newFunction(x,y) myAlternativeToNewFunction((x),(y))
#endif
and benefit from the new function when it exists, while also working just
fine (though perhaps less efficiently) on systems where it doesn't.
Post by j***@verizon.net
However, hexadecimal constants are definitely not an example of this. The
syntax was carefully chosen to be a syntax error in C90. Backwards
compatibility is a goal that the C committee supports, but only for strictly
conforming code; it does NOT include requiring that new versions of C continue
to treat as syntax errors or constraint violations everything that previous
versions did.
One could write strictly-conforming programs which behave differently on
C90 and C99 because of the changed definition of pp-number. Given, e.g.
#define QQ qq
a function that stringizes 0x1.0p+QQ after performing macro substitutions
would yield the string literal "0x1.0p+QQ" or "0x1.0p+qq" based upon whether
the preprocessor regarded "+QQ" as part of the preceding token. The
preprocessor need not know nor care whether the token is valid, and if it
doesn't survives preprocessing except in stringized form the later stages of
compilation wouldn't get to see it either.
Post by j***@verizon.net
I consider only the currently approved standard, which officially replaced the
older ones, to be the true standard. I admit that it's commonplace for people
to target older versions of the standard. However, something that's "de facto"
cannot, by its very nature, be a real standard, since "de facto", people write
code that doesn't conform to any official standard. The "de facto" standard is
definitely NOT C90 - it's some weird mixture of C90 and the most popular of the
extensions to C90 that are accepted by some widely used compilers. The "de
facto" standard is starting to include some C99isms.
A major purpose of official standards is to tell people how things should
behave *in cases where it would otherwise be unclear*. If everybody would
do something the same way even in the absence of a standard, nothing would
be gained by the presence of a standard unless or until someone had a reason
to want to do something else.
I am unable to concoct an example of a pp-number which is
not a number being changed into anything acceptable to the
C(89) parser by some macro action. Anybody got an example?
s***@casperkitty.com
2017-07-24 19:25:34 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
I am unable to concoct an example of a pp-number which is
not a number being changed into anything acceptable to the
C(89) parser by some macro action. Anybody got an example?
#define AP ap
#define str1(x) #x
#define str2(x) str1(x)

const char foo1[] = str2(0x1e+AP);
const char foo2[] = str2(0x1p+AP);
const char foo3[] = str2(0x1z+AP);

The definitions of foo1, foo2, and foo3 will be valid regardless of whether
AP gets macro-expanded, but expansion will affect the initial values in
those objects. None of the standards would allow AP to be expanded in foo1,
but all would require it to be expanded in foo3. The expansion in foo2
would be required by C89 but forbidden in C99.

Personally, I don't see much benefit to mandating or forbidding the
expansion of AP in examples like the above. I certainly would see benefits
to forbidding the expansion of AP in 0x1.AP+3, but I can see no reason
outside of contrived scenarios that code would care about whether or not
the text following the + sign was the same token as what precedes it, if
later stages of compilation would accept the code equally well either way.

If someone wants to validate that a piece of code is portable, the best way
to do that would be with an implementation designed for purposes of such
validation. If a piece of source code could be accepted by some compilers
but not others, but would have the same meaning on all compilers that accept
it, having most compilers accept the code by default would be preferable to
having them reject it. The only case where rejection would be preferable
would be if one wanted to ensure that code would be usable on implementations
that can't handle the code. The fewer such implementations exist, the less
need there would be to worry about that.
bartc
2017-07-24 19:54:47 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Kleinecke
I am unable to concoct an example of a pp-number which is
not a number being changed into anything acceptable to the
C(89) parser by some macro action. Anybody got an example?
#define AP ap
#define str1(x) #x
#define str2(x) str1(x)
const char foo1[] = str2(0x1e+AP);
const char foo2[] = str2(0x1p+AP);
const char foo3[] = str2(0x1z+AP);
Personally, I don't see much benefit to mandating or forbidding the
expansion of AP in examples like the above.
I tried that code with this program:

#include <stdio.h>
....
int main(void) {

printf("%s\n",foo1);
printf("%s\n",foo2);
printf("%s\n",foo3);

}

And got these results:

-----------------------------------
GCC (options: -m64 -o c.o -c -O3 otherwise people get upset):
0x1e+AP
0x1p+AP
0x1z+ap

TCC:
0x1e+AP
0x1p+AP
0x1z+ap

lccwin64:
0x1e+AP
0x1p+AP
0x1z+ap

Pellec C:
0x1e+AP
0x1p+AP
0x1z+ap

MSVC 2017:
0x1e + ap
0x1p + ap
0x1z + ap

DMC:
0x1e+ap
0x1p+ap
0x1z+ap

(Mine:)
0x1e+ap
0x1p+ap
0x1 z+ap
-----------------------------------

The first four are the same, but the other three including MSVC seem to
do their own thing. Because of that, I wouldn't worry about any
particular compiler conforming or not. But I wouldn't want a program
depending on the output from a particular implementation.

What *should* be the correct output for this code? I doubt many people
would know.
--
bartc
David Kleinecke
2017-07-24 20:16:49 UTC
Reply
Permalink
Raw Message
Post by s***@casperkitty.com
Post by David Kleinecke
I am unable to concoct an example of a pp-number which is
not a number being changed into anything acceptable to the
C(89) parser by some macro action. Anybody got an example?
#define AP ap
#define str1(x) #x
#define str2(x) str1(x)
const char foo1[] = str2(0x1e+AP);
const char foo2[] = str2(0x1p+AP);
const char foo3[] = str2(0x1z+AP);
The definitions of foo1, foo2, and foo3 will be valid regardless of whether
AP gets macro-expanded, but expansion will affect the initial values in
those objects. None of the standards would allow AP to be expanded in foo1,
but all would require it to be expanded in foo3. The expansion in foo2
would be required by C89 but forbidden in C99.
Personally, I don't see much benefit to mandating or forbidding the
expansion of AP in examples like the above. I certainly would see benefits
to forbidding the expansion of AP in 0x1.AP+3, but I can see no reason
outside of contrived scenarios that code would care about whether or not
the text following the + sign was the same token as what precedes it, if
later stages of compilation would accept the code equally well either way.
If someone wants to validate that a piece of code is portable, the best way
to do that would be with an implementation designed for purposes of such
validation. If a piece of source code could be accepted by some compilers
but not others, but would have the same meaning on all compilers that accept
it, having most compilers accept the code by default would be preferable to
having them reject it. The only case where rejection would be preferable
would be if one wanted to ensure that code would be usable on implementations
that can't handle the code. The fewer such implementations exist, the less
need there would be to worry about that.
Thanks - I suspected stringizing was necessary but I have
never used stringizing in code and am not very fluent in
it.

The question matters because exactly where the pp-number error
is declared an error matters. The standard says stage 7. But
I suppose the saving with moving it to stage 4 is trivial - no
matter how contrived examples of why.
j***@verizon.net
2017-07-24 19:33:06 UTC
Reply
Permalink
Raw Message
On Monday, July 24, 2017 at 2:45:40 PM UTC-4, David Kleinecke wrote:
...
Post by David Kleinecke
I am unable to concoct an example of a pp-number which is
not a number being changed into anything acceptable to the
C(89) parser by some macro action. Anybody got an example?
From the C99 rationale 6.4.8 line 25:

#define mkident(s) s ## 1m
int mkident(int) = 0;

While the Rationale goes on to explain changes in the way that pp-numbers are
defined in C99, these lines of code worked the same way in C90 as in C99: 1m is
a pp-number that doesn't qualify as a number, but mkident concatenates "int"
and "1m" to create the perfectly valid identifier int1m. Similarly, you could
create a string literal using

#define STR(a) #a
char b[] = STR(1m);
David Kleinecke
2017-07-24 20:20:06 UTC
Reply
Permalink
Raw Message
Post by j***@verizon.net
...
Post by David Kleinecke
I am unable to concoct an example of a pp-number which is
not a number being changed into anything acceptable to the
C(89) parser by some macro action. Anybody got an example?
#define mkident(s) s ## 1m
int mkident(int) = 0;
While the Rationale goes on to explain changes in the way that pp-numbers are
defined in C99, these lines of code worked the same way in C90 as in C99: 1m is
a pp-number that doesn't qualify as a number, but mkident concatenates "int"
and "1m" to create the perfectly valid identifier int1m. Similarly, you could
create a string literal using
#define STR(a) #a
char b[] = STR(1m);
There is no problem for the pre-processor to detect things
like your first example - but thanks for the reply.

Ben Bacarisse
2017-07-18 02:25:25 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Just for laughs I am posting my state machine for C
assignment expressions. This machine implements the
entire section 6.3 syntax in the 1989 C Standard.
Why such an old standard?
Post by David Kleinecke
The machine as presented is a syntax checker but I
have added comments about what is needed to make it
a compiler. There is a great deal more I could say
but someone familiar with C compilers should be able
to follow this approach.
Not without a lot of guess-work about the notation. If you want anyone
to follow it, you should give a brief description.
Post by David Kleinecke
assignment-expression = // reset rvalue
0 OPEN A
# 7
A type-name CLOSE 0 // set rvalue - push cast prefix operator
expression CLOSE 5 // primitive
4 OPEN B
# 7
B type-name CLOSE 4 // push cast prefix operator
expression CLOSE 5 // primitive
6 OPEN C
# 7
C type-name CLOSE 1 // object (size of type)
expression CLOSE 5 // primitive
3 OPEN 9
# 7
9 expression CLOSE 5 // primitive
7 INCREMENT 3 // push prefix operator
DECREMENT 3 // push prefix operator
SIZEOF 6
UnaryOperator 4 // push prefix operator
# 8
8 Identifier 5 // primitive
Constant 5 // primitive
StringLiteral 5 // primitive
5 START expression STOP 5
OPEN D
PERIOD Identifier 5
ARROW Identifier 5
INCREMENT 5
DECREMENT 5
# 1 // object (expanded primitive)
D assignment-expression E // prepare one argument
CLOSE 5
E COMMA D
CLOSE 5
1 AssignmentOperator 2 // ERROR if rvalue
QUERY expression COLON 2 // set rvalue
BinaryOperator 2 // set rvalue
# 1 // clear
2 # 0 // apply operator
Parsing tools are easy to use and generally produce reliable and
maintainable results. And established methods for writing parsers by
hand have a solid theoretical basis behind them. What's the benefit of
whatever your method is? (I can't discern the method from this example.)
My motive is to get closer to the iron. The same motive
that leads people to code in assembly.
That rather begs the question. What are the benefits of this
metaphorical closeness? Programs written in HLLs are usually considered
clearer and easier to maintain, so people write assembler only when the
benefits are very great. Automatically generated parsers would also
have these high-level benefits so there should be corresponding benefits
for eschewing them.
Post by David Kleinecke
It also leads IMO
to a clearer understanding of the dividing line between
syntax and constraints/semantics. I could modify the machine
in many ways to avoid making unnecessary semantic rulings -
like noticing that
666.foo
is an error.
That may be a function of familiarity. If you knew a parser generator
well, you'd probably find it equally clear and just as simple to modify.
Post by David Kleinecke
The machine works like this: There is a set of states -
these are indicated by the numbers in the first column.
(in hex, presumably)
Post by David Kleinecke
The state numbers are purely labels and have no numeric
interpretation. In each state there are several tests
that are made in order. Each test either involves
identifying the current token or calling a function
(which is another state machine and returns either true
or false).
What happens if calling another machine pulls in lots of tokens before
failing?
Post by David Kleinecke
If the token is recognized it is used and
the next token brought in. The number after the test
is the next state. I abbreviated series of successive
tests without any alternatives. The # "test" is the
default. States without a # end in an error. The +
means the machine function returns true - all other
returns are false.
I can't see any + in the code above, but this machine must surely be
able to succeed, no?
Post by David Kleinecke
Token identifiers in all caps are
unique tokens. Tests of things in camel code are tests
for one of a set of tokens.
It took a while for me to work out what START and STOP where. It would
be much clearer to write the tokens out in some self-evident way: '('
instead of OPEN for example.

What happens to this machine when, in state 1, it sees neither an
assignment operator, a binary operator nor a QUERY? It would appear to
simply loop forever.

There's a hint that a stack is involved somewhere (the comments say push
a few times) but nothing more than a hint. Is there a stack?

I see that "set rvalue" is done in a few places, presumably to be able
to honour the "ERROR if rvalue" comment on seeing an assignment. This
makes it sound like a flag, but nothing ever unsets the rvalue flag (and
it looks like it should be set in lots of other cases too). How does
this setting work?
--
Ben.
David Kleinecke
2017-07-18 04:19:06 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Just for laughs I am posting my state machine for C
assignment expressions. This machine implements the
entire section 6.3 syntax in the 1989 C Standard.
Why such an old standard?
Post by David Kleinecke
The machine as presented is a syntax checker but I
have added comments about what is needed to make it
a compiler. There is a great deal more I could say
but someone familiar with C compilers should be able
to follow this approach.
Not without a lot of guess-work about the notation. If you want anyone
to follow it, you should give a brief description.
Post by David Kleinecke
assignment-expression = // reset rvalue
0 OPEN A
# 7
A type-name CLOSE 0 // set rvalue - push cast prefix operator
expression CLOSE 5 // primitive
4 OPEN B
# 7
B type-name CLOSE 4 // push cast prefix operator
expression CLOSE 5 // primitive
6 OPEN C
# 7
C type-name CLOSE 1 // object (size of type)
expression CLOSE 5 // primitive
3 OPEN 9
# 7
9 expression CLOSE 5 // primitive
7 INCREMENT 3 // push prefix operator
DECREMENT 3 // push prefix operator
SIZEOF 6
UnaryOperator 4 // push prefix operator
# 8
8 Identifier 5 // primitive
Constant 5 // primitive
StringLiteral 5 // primitive
5 START expression STOP 5
OPEN D
PERIOD Identifier 5
ARROW Identifier 5
INCREMENT 5
DECREMENT 5
# 1 // object (expanded primitive)
D assignment-expression E // prepare one argument
CLOSE 5
E COMMA D
CLOSE 5
1 AssignmentOperator 2 // ERROR if rvalue
QUERY expression COLON 2 // set rvalue
BinaryOperator 2 // set rvalue
# 1 // clear
2 # 0 // apply operator
Parsing tools are easy to use and generally produce reliable and
maintainable results. And established methods for writing parsers by
hand have a solid theoretical basis behind them. What's the benefit of
whatever your method is? (I can't discern the method from this example.)
My motive is to get closer to the iron. The same motive
that leads people to code in assembly.
That rather begs the question. What are the benefits of this
metaphorical closeness? Programs written in HLLs are usually considered
clearer and easier to maintain, so people write assembler only when the
benefits are very great. Automatically generated parsers would also
have these high-level benefits so there should be corresponding benefits
for eschewing them.
Compilers are not ordinary programs and have different economics.
Ease of maintenance is not an important aspect. And one can do
special things much more easily
Post by Ben Bacarisse
Post by David Kleinecke
It also leads IMO
to a clearer understanding of the dividing line between
syntax and constraints/semantics. I could modify the machine
in many ways to avoid making unnecessary semantic rulings -
like noticing that
666.foo
is an error.
That may be a function of familiarity. If you knew a parser generator
well, you'd probably find it equally clear and just as simple to modify.
Post by David Kleinecke
The machine works like this: There is a set of states -
these are indicated by the numbers in the first column.
(in hex, presumably)
Your little joke?
Post by Ben Bacarisse
Post by David Kleinecke
The state numbers are purely labels and have no numeric
interpretation. In each state there are several tests
that are made in order. Each test either involves
identifying the current token or calling a function
(which is another state machine and returns either true
or false).
What happens if calling another machine pulls in lots of tokens before
failing?
With C, by design, this is not a problem. A general parser
would simply note where the submachine tokens begin and revert
the input token stream back to that point if the submachine
turned out to be a garden path.
Post by Ben Bacarisse
Post by David Kleinecke
If the token is recognized it is used and
the next token brought in. The number after the test
is the next state. I abbreviated series of successive
tests without any alternatives. The # "test" is the
default. States without a # end in an error. The +
means the machine function returns true - all other
returns are false.
I can't see any + in the code above, but this machine must surely be
able to succeed, no?
Post by David Kleinecke
Token identifiers in all caps are
unique tokens. Tests of things in camel code are tests
for one of a set of tokens.
It took a while for me to work out what START and STOP where. It would
be much clearer to write the tokens out in some self-evident way: '('
instead of OPEN for example.
What happens to this machine when, in state 1, it sees neither an
assignment operator, a binary operator nor a QUERY? It would appear to
simply loop forever.
There's a hint that a stack is involved somewhere (the comments say push
a few times) but nothing more than a hint. Is there a stack?
I see that "set rvalue" is done in a few places, presumably to be able
to honour the "ERROR if rvalue" comment on seeing an assignment. This
makes it sound like a flag, but nothing ever unsets the rvalue flag (and
it looks like it should be set in lots of other cases too). How does
this setting work?
--
Ben.
Ben Bacarisse
2017-07-19 01:11:46 UTC
Reply
Permalink
Raw Message
<snip>
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
Parsing tools are easy to use and generally produce reliable and
maintainable results. And established methods for writing parsers by
hand have a solid theoretical basis behind them. What's the benefit of
whatever your method is? (I can't discern the method from this example.)
My motive is to get closer to the iron. The same motive
that leads people to code in assembly.
That rather begs the question. What are the benefits of this
metaphorical closeness? Programs written in HLLs are usually considered
clearer and easier to maintain, so people write assembler only when the
benefits are very great. Automatically generated parsers would also
have these high-level benefits so there should be corresponding benefits
for eschewing them.
Compilers are not ordinary programs and have different economics.
Ease of maintenance is not an important aspect. And one can do
special things much more easily
OK, I should not have tried to guess what you meant with the analogy to
assembler. I'd like to know what you believe is the benefit of this way
of doing things, stated explicitly. By referring to why people use
assembler you invite a comparison but I clearly did not get what you
meant by it. What's the benefit to a parser of being "close to the
iron".

<snip>
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
The machine works like this: There is a set of states -
these are indicated by the numbers in the first column.
(in hex, presumably)
Your little joke?
How is that a joke? Your example has labels 0 to E. Initially I
wondered what the significance of the letters as labels vs. digits as
labels. But you now say the labels are all numbers so the A to E labels
must be numbers in hex, no?
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
The state numbers are purely labels and have no numeric
interpretation. In each state there are several tests
that are made in order. Each test either involves
identifying the current token or calling a function
(which is another state machine and returns either true
or false).
What happens if calling another machine pulls in lots of tokens before
failing?
With C, by design, this is not a problem. A general parser
would simply note where the submachine tokens begin and revert
the input token stream back to that point if the submachine
turned out to be a garden path.
OK, so this is a backtracking parser.
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
... The +
means the machine function returns true - all other
returns are false.
I can't see any + in the code above, but this machine must surely be
able to succeed, no?
Post by David Kleinecke
Token identifiers in all caps are
unique tokens. Tests of things in camel code are tests
for one of a set of tokens.
It took a while for me to work out what START and STOP where. It would
be much clearer to write the tokens out in some self-evident way: '('
instead of OPEN for example.
What happens to this machine when, in state 1, it sees neither an
assignment operator, a binary operator nor a QUERY? It would appear to
simply loop forever.
There's a hint that a stack is involved somewhere (the comments say push
a few times) but nothing more than a hint. Is there a stack?
I see that "set rvalue" is done in a few places, presumably to be able
to honour the "ERROR if rvalue" comment on seeing an assignment. This
makes it sound like a flag, but nothing ever unsets the rvalue flag (and
it looks like it should be set in lots of other cases too). How does
this setting work?
If you want people to comment I think you should address these
questions. They don't seem to me to be trivial, or I would not have
asked them.
--
Ben.
David Kleinecke
2017-07-22 02:20:41 UTC
Reply
Permalink
Raw Message
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
(in hex, presumably)
Your little joke?
How is that a joke? Your example has labels 0 to E. Initially I
wondered what the significance of the letters as labels vs. digits as
labels. But you now say the labels are all numbers so the A to E labels
must be numbers in hex, no?
Come on - try a little harder. I could use any alphanumeric
string. But I should have written "characters" not "numbers.
My bad.
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
The state numbers are purely labels and have no numeric
interpretation. In each state there are several tests
that are made in order. Each test either involves
identifying the current token or calling a function
(which is another state machine and returns either true
or false).
What happens if calling another machine pulls in lots of tokens before
failing?
With C, by design, this is not a problem. A general parser
would simply note where the submachine tokens begin and revert
the input token stream back to that point if the submachine
turned out to be a garden path.
OK, so this is a backtracking parser.
Post by David Kleinecke
Post by Ben Bacarisse
Post by David Kleinecke
... The +
means the machine function returns true - all other
returns are false.
I can't see any + in the code above, but this machine must surely be
able to succeed, no?
Post by David Kleinecke
Token identifiers in all caps are
unique tokens. Tests of things in camel code are tests
for one of a set of tokens.
It took a while for me to work out what START and STOP where. It would
be much clearer to write the tokens out in some self-evident way: '('
instead of OPEN for example.
What happens to this machine when, in state 1, it sees neither an
assignment operator, a binary operator nor a QUERY? It would appear to
simply loop forever.
Thank you. There is a serious error there. Like all errors
I cannot imagine how it happened. The last line in state 1
should be:
# +
meaning return true. The comment is correct.
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
There's a hint that a stack is involved somewhere (the comments say push
a few times) but nothing more than a hint. Is there a stack?
The stack is only involved if I am expanded the syntax checker
into a compiler. All the comments apply to the compiler expansion
and an be ignored for the syntax checker. There are several
stacks involved in compilation. The one the comments refer to is
the stack of operators waiting to be applied.
Post by Ben Bacarisse
Post by David Kleinecke
Post by Ben Bacarisse
I see that "set rvalue" is done in a few places, presumably to be able
to honour the "ERROR if rvalue" comment on seeing an assignment. This
makes it sound like a flag, but nothing ever unsets the rvalue flag (and
it looks like it should be set in lots of other cases too). How does
this setting work?
I used the verb "reset" on the first line. I read the rvalue flag
as meaning object cannot be assigned to. I think I have this
right. The C syntax allows a good many constructions that the
constraints and semantics forbid.
Post by Ben Bacarisse
If you want people to comment I think you should address these
questions. They don't seem to me to be trivial, or I would not have
asked them.
I hope I addressed them adequately. Thank you fro catching my
error.
Jonas Eklundh
2017-07-21 21:15:39 UTC
Reply
Permalink
Raw Message
Steve "Steven Petruzzellis" Carroll's obsession with Snit started in 2004 when Carroll got mad about his then girlfriend obsessing over Snit (heavily documented here: <http://tinyurl.com/proof-about-ebot>). Snit continued to respond to Steve for about 5 years, when Steve flipped out in 2009 and started contacting Snit's employer with the stated goal to have him fired (he spoke of doing so even if he had to twist arms):

<http://goo.gl/OHNryA>
<http://goo.gl/MZ6yCD>
<http://goo.gl/WaKKGq>

There were more, but Carroll has had them deleted from the Google archive. I have not spent the time to find them elsewhere (and likely will not).

With that Snit stopped responding directly to Steve except for *one* chance he gave him in 2011 when Steve was accusing *Snit* of running: <http://goo.gl/racU64>.

Carroll, as predicted, ran (he always does when faced with facts):
<http://goo.gl/qHs5Xh>

Steven Petruzzellis knows he has no backing for any of his nonsense and has become, if anything, more and more obsessive since Snit stopped responding to him.

Steve "Steven Petruzzellis" Carroll is truly a very, very sick man.

--
Curious how these posts are made? Email: ***@gmail.com
Jonas Eklundh
2017-07-21 21:23:02 UTC
Reply
Permalink
Raw Message
Steve "Steven Petruzzellis" Carroll's obsession with Snit started in 2004 when Carroll got mad about his then girlfriend obsessing over Snit (heavily documented here: <http://tinyurl.com/proof-about-ebot>). Snit continued to respond to Steve for about 5 years, when Steve flipped out in 2009 and started contacting Snit's employer with the stated goal to have him fired (he spoke of doing so even if he had to twist arms):

<http://goo.gl/OHNryA>
<http://goo.gl/MZ6yCD>
<http://goo.gl/WaKKGq>

There were more, but Carroll has had them deleted from the Google archive. I have not spent the time to find them elsewhere (and likely will not).

With that Snit stopped responding directly to Steve except for *one* chance he gave him in 2011 when Steve was accusing *Snit* of running: <http://goo.gl/racU64>.

Carroll, as predicted, ran (he always does when faced with facts):
<http://goo.gl/qHs5Xh>

Steven Petruzzellis knows he has no backing for any of his nonsense and has become, if anything, more and more obsessive since Snit stopped responding to him.

Steve "Steven Petruzzellis" Carroll is truly a very, very sick man.
--
Best CMS Solution of 2017
http://www.5z8.info/hateminorities_t7l3gt_linked-in-of-sex
Jonas Eklundh
Steve Carroll
2017-07-22 01:25:25 UTC
Reply
Permalink
Raw Message
Both of those guys get their jollies out of eliciting emotional reactions to their attacks, which is the very definition of a troll.

http://www.5z8.info/backyard-fireworks-disasters_t7w7pu_bombbuilding boyah! Then this happened: <http://www.5z8.info/add-worm_z4r9qk_fakelogin>. It's like a poorly written flyer. The Script Troller has already decided what he is going to say before he calls. What you say doesn't matter. What Steve Carroll says does not matter. The Mack has more consistent hot keys. Steve Carroll will never live that down. GNU is based on Linux. You guys can only think from the perspective of a programmer. After yesterday's update I no longer have a working system. Thanks Ubuntu! His desire is to see Steve Carroll harmed by the having me get blamed for his actins. And hey, that could work.

He is as incompetent as The Script Troller.



--
"You'll notice how quickly he loses interest when everything is about him. He clearly wants the attention"
Steven Petruzzellis, making the dumbest comment ever uttered.
h***@gmail.com
2017-07-22 02:51:12 UTC
Reply
Permalink
Raw Message
Post by David Kleinecke
Just for laughs I am posting my state machine for C
assignment expressions. This machine implements the
entire section 6.3 syntax in the 1989 C Standard.
Seems to me that syntax checkers are a lost art.

When computers were slower, and especially when the cost real
money to use, it was sometimes useful to use a syntax checker
before the actual compiler.

In the case of C, there was lint.

But now it is easier to just run the compiler.
Loading...