question about linker

I don't use them in generated code either. (Only in a brief section at
the top to define my prefered type designations.)

I am generating the prototypes for all functions called.
No includes, no macros.
The generated code is depending on compiler flags , platform and
headers. It is intended to direct usage like in the pipeline
use my_compiler -> C89 -> CC

Post by Thiago Adams
At this output, I am generating the prototypes for the functions I call.
For instance,
int strcmp( const char* lhs, const char* rhs );
is generated as
int strcmp( char* lhs, char* rhs );

I don't use #include either, not even for standard headers, although
gcc doesn't like it when I define my own std library functions. There
are ways to shut it up though.

Post by Thiago Adams
I am also generating the structs as required (on demand). For the
structs I am renaming it because I am generating all structs at
global scope.
Does the compiler/linkers? Cares If I am lying about const? Or If
rename the structs?
I think it does not care, and it seems to work (it compiles and run).

I don't know why the linker would care about anything. All it sees are
symbol imports and exports.
A compiler might care about lack of 'const' in that it could stop it
doing doing some optimisations.
But it can't report a type mismatch between 'const' and non-'const'
types if 'const' has been banished completely.

I think GCC has some builtin functions and he can complain if the
function prototype differs.
Do you have any idea what else can be simplified when creating a C
compiler?

What are you asking; are you thinking of writing one? Because C
compilers already exist!

If so, think of what you would find troublesome. I could create a long
list of things that makes C harder to compile than my own language.

I was thinking about literal strings.
Instead of
f("abc");
Generating something like
char literal_string_1[] = {'a', 'b', 'c', '\0' }; //or numbers global
f(literal_string_1);

I wouldn't bother with this. How hard is it to deal with string
constants? Even a 4KB BASIC from a microcomputer had them!

If you want to simplify, perhaps get rid of 'A' constants. (Unless you
think you're likely to be running on an EBCDIC system. However in my
compilers, 'A' gets turned into 65 early on anyway.)

The reason I believe compilers already have to do this right? (put
strings in a data section)

Data like {'a', 'b', 'c', 0} will be put into in a .data segment. But
"abc" will be put by gcc into a .rodata segment, so it's safer. Maybe
'const' will fix the former, but you no longer have that.

So this was one extra simplification I was thinking about.
Also remove loops (for , while) and switches.

This sort of simplification is of more benefit when /generating/ C code.
It's not usually something to worry about in the tool that will turn
that intermediate C into native code.

This comes back to the question above.

BGB

2024-11-26 23:23:36 UTC

I don't use them in generated code either. (Only in a brief section
at the top to define my prefered type designations.)

I don't use #include either, not even for standard headers, although
gcc doesn't like it when I define my own std library functions. There
are ways to shut it up though.

I don't know why the linker would care about anything. All it sees
are symbol imports and exports.
A compiler might care about lack of 'const' in that it could stop it
doing doing some optimisations.
But it can't report a type mismatch between 'const' and non-'const'
types if 'const' has been banished completely.

I think GCC has some builtin functions and he can complain if the
function prototype differs.
Do you have any idea what else can be simplified when creating a C compiler?

What are you asking; are you thinking of writing one? Because C
compilers already exist!
If so, think of what you would find troublesome. I could create a long
list of things that makes C harder to compile than my own language.

My usual list of things:
Drop stuff that is pretty much never used;
Such as digraphs and trigraphs.
Drop stuff that is very rarely used:
Such as C's bitfield declarations;
Simplify the type syntax.
Allow declaration parsing without checking prior typedefs;
Eliminate some syntactic ambiguities;
Simplify the type-system:
Drop complex cases (*1).

*1: Or, what probably to keep:
Primitive types;
Pointer to a primitive type (up to N levels);
Pointer to a structure;
Pointer to a function.

An array might become an aspect of a declaration rather than the type of
the declaration (so, values of with an array type might no longer exist,
only a pointer to a type, with the destination happening to be an
array). Loading an array into an expression would implicitly always
decay to a pointer to the primitive type. In this model, things like
multidimensional arrays being part of the type-system would essentially
disappear (but can be done in other ways, such as via manual calculation
or pointer indirection).

One could possibly also allow for more implementation-defined limits, say:
A function may not contain more than 256 local variables;
A function may not have more than 16 or 32 arguments;
A function may not have more than 16K of local array or structure storage;
A function may not contain more than 4092 assignments to the same local
variable;
A "switch()" may not contain more than 1020 "case" labels;
...

Possibly, the conceptual model for by-value structure passing could be
modified:
The concept of the structure-type and data storage could be split into a
conceptually paired structure-pointer type and an "untyped blob of
bytes" type;
Struct assignment could be defined to behave "as-if" it were a memcpy
from one blob of bytes to another;
By-value structs might have a minimum alignment padding larger than that
otherwise implied for the structure (say, for example, if any structure
larger than 64 bytes was implicitly padded to a multiple of 16 bytes,
and one larger than 32 to a multiple of 8);
Using a structure in an expression will cause it to decay to a pointer
in a similar way to arrays.

Similarly, may drop the distinction between "a.b" and "a->b" as serving
little real purpose.

A lot of this would potentially break source compatibility with C in
lots of subtle ways, but could allow for a simpler compiler.

I was thinking about literal strings.
Instead of
f("abc");
Generating something like
char literal_string_1[] = {'a', 'b', 'c', '\0' }; //or numbers global
f(literal_string_1);

I wouldn't bother with this. How hard is it to deal with string
constants? Even a 4KB BASIC from a microcomputer had them!
If you want to simplify, perhaps get rid of 'A' constants. (Unless you
think you're likely to be running on an EBCDIC system. However in my
compilers, 'A' gets turned into 65 early on anyway.)

My compiler normally assumes one of:
Plain ASCII;
Codepage-1252;
UTF-8;
UTF-16.

My project also tends to use a modified version of the Unicode space
where the C1 control codes are dropped in favor of interpreting them as
the corresponding 1252 characters. This allows treating 8859-1 and 1252
scenarios as equivalent, and because actually using the C1 control codes
is very rare.

It could be considered to some extent context-sensitive though.

The reason I believe compilers already have to do this right? (put
strings in a data section)

Data like {'a', 'b', 'c', 0} will be put into in a .data segment. But
"abc" will be put by gcc into a .rodata segment, so it's safer. Maybe
'const' will fix the former, but you no longer have that.

BGBCC generally uses ".strtab":
".data", read/write global data;
".rodata", read-only global data;
".strtab", strings table.

So this was one extra simplification I was thinking about.
Also remove loops (for , while) and switches.

Yeah.

If one has a full parser, this would save very little.
It would make sense if one is doing a limited parser.
But, then, it may also make sense to limit *any* sort of complex
expressions.

Say:
z=x*y+3;
Is no longer valid and needs to be decomposed:
t0=x*y;
z=t0+3;

This could then allow the compiler to operate one line at a time.

But, at that point, almost may as well drop down to a BASIC or FORTRAN
style syntax.

fir

2024-11-26 23:54:43 UTC

Such as C's bitfield declarations;

lol, bitfields very rarely used?
they may be used rarely but they are very important and needed,
whan you say have map of tiles, or big arrays of characters
you need couple of soem bit-size flags in them and you dont want to
waste big memory on chars or do logical arithmetic on bytes

for sure you cant skip bitfields

BGB

2024-11-27 00:10:48 UTC

Post by fir

Such as C's bitfield declarations;

Almost always, code will do something like:
if(val&FLAG)
Or:
r=(val>>10)&31;
g=(val>> 5)&31;
b=(val>> 0)&31;
...

But:
struct rbg555_s {
uint16_t b:5;
uint16_t g:5;
uint16_t r:5;
uint16_t t:1;
};
Not so much...

Bart

2024-11-27 00:59:43 UTC

Post by Thiago Adams
Do you have any idea what else can be simplified when creating a C compiler?

What are you asking; are you thinking of writing one? Because C
compilers already exist!
If so, think of what you would find troublesome. I could create a long
list of things that makes C harder to compile than my own language.

Since BGB posted a list, here's mine; it is not exhaustive:

Hard C Features

* Type declarations (which mix up variables, function pointers,
function declarations in the same declaration, and even
function definitions start off the same).

* At filescope, multiple definitions of the same variable, which
can have different linkage, eg static and extern, with only
some orderings legal

* Multiple declarations of the same function, which can have different
parameter names

* Figuring out whether any module-scope variable is imported, or
exported, or even if it's local, since 'static' can be used at the
same time as 'extern' or just nothing.

* Figuring out, when compiling a dynamic library, which non-static
functions (and variables) should be exported from the library).

* A zoo of different integer types, perhaps 30 ways of defining the
8 basic machine types. Except C likes to use 11 basic types (to
include plain 'char' and 'long' between 'int' and 'long long') which
are expected to be distinct from other, even if the same size.

* Multiple ways of denoting such types ('int long unsigned long' for
example)

* Multiple ways of placing 'const', 'static' and 'typedef':

int const const const unsigned typedef T;

* Forward declarations of enums and struct tags

* Three different namespaces: normal, struct/enum tags, and labels

* Unlimited numbers of distinct block scopes within a function

* 'break' serving two different roles depending on context

* The preprocessor, macros and macros expansions (this is an entire
language by itself)

* Implementing line continuation using '\', which not many are aware can
occur in the middle of tokens, eg:

/\
/ comment
in\
t a;
"ABC\\ (string with \n split across two lines)
n"

* Implementation-defined algorithm to locate any include file deepinside
a nested set of includes. (You can create your own, but you also have
to be able to compile existing code!)

* The requirement to be able to evaluate any arithmetic/logic expression
at compile-time, if constants are used, even for ?: ternary op

* The algorithm for how many {...} can be omitted from an initialiser;
the following is legal; a completely flat (and incomplete) list can be
provided:

int A[2][3][4] = {1,2,3,4,5,6,7};

* The algorithm for designated initialisers, which is harder than most
people think because designators can be nested: {a.b.c = x}

* The algorithm for determining inter-member struct padding and end-
padding

* VLAs

* Compound literals (especially elements that are runtime expressions)

* Switch-case with its crazy syntax

* Allowing '(********F)(x)' (it it allowed; I can't even remember the
rules for how many more, or fewer, "*" are allowed relative to the
type of F.

* The rules for when ";" is needed after "}".

* The rules for mixed signed integer operators, ie. whether the
operation and result is done as signed or unsigned. (I created an
8 x 8 chart of the combinations; it was weird.)

Etc.

That's just off the top of my head.

Thiago Adams

2024-11-27 01:52:37 UTC

Post by Thiago Adams
Do you have any idea what else can be simplified when creating a C compiler?

What are you asking; are you thinking of writing one? Because C
compilers already exist!
If so, think of what you would find troublesome. I could create a long
list of things that makes C harder to compile than my own language.

Hard C Features
* Type declarations (which mix up variables, function pointers,
function declarations in the same declaration, and even
function definitions start off the same).
* At filescope, multiple definitions of the same variable, which
can have different linkage, eg static and extern, with only
some orderings legal
* Multiple declarations of the same function, which can have different
parameter names
* Figuring out whether any module-scope variable is imported, or
exported, or even if it's local, since 'static' can be used at the
same time as 'extern' or just nothing.
* Figuring out, when compiling a dynamic library, which non-static
functions (and variables) should be exported from the library).
* A zoo of different integer types, perhaps 30 ways of defining the
8 basic machine types. Except C likes to use 11 basic types (to
include plain 'char' and 'long' between 'int' and 'long long') which
are expected to be distinct from other, even if the same size.
* Multiple ways of denoting such types ('int long unsigned long' for
example)
int const const const unsigned typedef T;
* Forward declarations of enums and struct tags
* Three different namespaces: normal, struct/enum tags, and labels
* Unlimited numbers of distinct block scopes within a function
* 'break' serving two different roles depending on context
* The preprocessor, macros and macros expansions (this is an entire
language by itself)
* Implementing line continuation using '\', which not many are aware can
/\
/ comment
in\
t a;
"ABC\\ (string with \n split across two lines)
n"
* Implementation-defined algorithm to locate any include file deepinside
a nested set of includes. (You can create your own, but you also have
to be able to compile existing code!)
* The requirement to be able to evaluate any arithmetic/logic expression
at compile-time, if constants are used, even for ?: ternary op
* The algorithm for how many {...} can be omitted from an initialiser;
the following is legal; a completely flat (and incomplete) list can be
int A[2][3][4] = {1,2,3,4,5,6,7};
* The algorithm for designated initialisers, which is harder than most
people think because designators can be nested: {a.b.c = x}
* The algorithm for determining inter-member struct padding and end-
padding
* VLAs
* Compound literals (especially elements that are runtime expressions)
* Switch-case with its crazy syntax
* Allowing '(********F)(x)' (it it allowed; I can't even remember the
rules for how many more, or fewer, "*" are allowed relative to the
type of F.
* The rules for when ";" is needed after "}".
* The rules for mixed signed integer operators, ie. whether the
operation and result is done as signed or unsigned. (I created an
8 x 8 chart of the combinations; it was weird.)
Etc.
That's just off the top of my head.

I think K&R C is simpler than C89 that is simpler than C99 etc...

My objective if to find the minimum code generator (backend) in C and
leave the other complexities (like warnings, static analysis, constexpr,
preprocessor) to the frond end.

This also facilitates to have more than on backend sharing the job done
by the front end.

For instance, I may move all constant expressions from the generated
code. So the backend does not have to compute constant expressions any
more and it still C89 compatible. I removed enuns for instance.

So my question is not about how to create a simple C compiler but how to
separate and move most of the job to the front end creating a very
simple backend (code generator) which the input is code C89 compatible.

I believe that K&R C is simpler than C89, which in turn is simpler than
C99, and so on.

My goal is to design a minimal code generator (backend) in C reading
C89, while delegating other complexities—such as warnings, static
analysis, constexpr, and preprocessing—to the front end.

This approach also facilitates using multiple backends that share the
work handled by the front end.

For example, I might remove all constant expressions from the generated
code, so the backend no longer needs to compute them and remains
C89-compatible. I've already removed features like enums, typedefs for
instance.

Therefore, my question isn't about how to create a simple C compiler.

Instead, it's about how to shift most of the workload to the front end,
resulting in a very simple backend (code generator) that processes
C89-compatible code as input.

Does it makes sense?

Thiago Adams

2024-11-27 09:37:52 UTC

Post by Thiago Adams
Do you have any idea what else can be simplified when creating a C compiler?

...

Post by Thiago Adams
My goal is to design a minimal code generator (backend) in C reading
C89, while delegating other complexities—such as warnings, static
analysis, constexpr, and preprocessing—to the front end.
This approach also facilitates using multiple backends that share the
work handled by the front end.
For example, I might remove all constant expressions from the generated
code, so the backend no longer needs to compute them and remains C89-
compatible. I've already removed features like enums, typedefs for
instance.

...

I was wondering if splitting expressions would make the backend simpler

for instance

int r = a + b * c;

converted to

int r1 = b * c;
int r2 = a + r1;
int r = r2;

also one options is to extend the generated language with some macros.
(the generated code still works in C compiler)
for instance macros to include the integer promotion that are ignored in
real c compiler but used in this simple c compiler backend.

#define CHAR_TO_INT(X)

int r1 = b * CHAR_TO_INT(c);

then the backend also does not worry with this and just follows intructions.

Bart

2024-11-27 11:57:29 UTC

Post by Bart
Hard C Features

...

Post by Thiago Adams
I think K&R C is simpler than C89 that is simpler than C99 etc...
My objective if to find the minimum code generator (backend) in C and
leave the other complexities (like warnings, static analysis, constexpr,
preprocessor) to the frond end.
This also facilitates to have more than on backend sharing the job done
by the front end.
For instance, I may move all constant expressions from the generated
code. So the backend does not have to compute constant expressions any
more and it still C89 compatible. I removed enuns for instance.
So my question is not about how to create a simple C compiler but how to
separate and move most of the job to the front end creating a very
simple backend (code generator) which the input is code C89 compatible.
I believe that K&R C is simpler than C89, which in turn is simpler than
C99, and so on.
My goal is to design a minimal code generator (backend) in C reading
C89, while delegating other complexities—such as warnings, static
analysis, constexpr, and preprocessing—to the front end.
This approach also facilitates using multiple backends that share the
work handled by the front end.
For example, I might remove all constant expressions from the generated
code, so the backend no longer needs to compute them and remains
C89-compatible. I've already removed features like enums, typedefs for
instance.
Therefore, my question isn't about how to create a simple C compiler.
Instead, it's about how to shift most of the workload to the front end,
resulting in a very simple backend (code generator) that processes
C89-compatible code as input.
Does it makes sense?

Not really. You're basically talking about using an IR or IL, which most
compilers already do, including mine now. Clang for example uses LLVM IR.

Some languages use C as intermediate language.

But you seem to be getting C, and lower level ILs, mixed up.

If you are transpiling to C, then just generate C code, C89 if you like.
In that case you don't need to discard 90% of the language to make it
simpler! Simpler for whom? C89 compilers that can deal with function
prototypes etc already exist; you said you are not writing your own
compiler.

Post by Thiago Adams
I was wondering if splitting expressions would make the backend simpler
for instance
int r = a + b * c;
converted to
int r1 = b * c;
int r2 = a + r1;
int r = r2;

You are still talking as though YOU are writing the backend! You will
either use an existing C compiler or an existing IL backend, but there
aren't that many of the latter.

The most famous is LLVM, but it is fantastically complex, huge and slow.

(To see examples of LLVM IR, go to godbolt.org, choose C language,
choose a Clang compiler, and enter '-S -emit-llvm' as the compiler
options. Then try an example C function in the left panel.)

I also use ILs for my compilers, but I write my own backends. I've
worked on two diifferent kinds. One looks like a HLL, and only exists
for my language. So this original source:

proc F=
int r, a, b, c
r := a + b*c
end

Generates this IL:

Proc f():
i64 r
i64 a
i64 b
i64 c
!------------------------
T1 := b * c i64
T2 := a + T1 i64
r := T2 i64
!------------------------
retproc
End

It looks great, but was hard to work with. Instead I settled on this
lower level IL, which looks like assembly. That one works also with C,
so given this C function:

void F() {
int r, a, b, c;
r = a + b*c;
}

it produces this IL code:

proc F::
local i32 r.1
local i32 a.1
local i32 b.1
local i32 c.1
!------------------------
load i32 a.1 ! 00005
load i32 b.1 ! 00006
load i32 c.1 ! 00007
mul i32 ! 00008
add i32 ! 00009
store i32 r.1 ! 00010
!------------------------
#1:
retproc ! 00013
endproc

(The '::' indicates an exported function, as 'static' was not used. The
.1 suffixes are to do with block scopes, since there can be multiple 'a'
identifiers in a function.)

No function prototypes are needed, since everything needed is specified
at the call-site. The front-end will provide an necesary conversions or
promotions. There are attributes that appear to mark variadic calls for
example.

Only imported functions need to be listed.

This sounds vaguely like what you are trying to achieve, but you have
the idea that this IL must be C.

C however will generally need that extra info (function signatures etc)
or it will cause problems. But it is very little trouble to provide them.

Bart

2024-11-27 12:10:22 UTC

Post by Bart
I also use ILs for my compilers, but I write my own backends. I've
worked on two diifferent kinds. One looks like a HLL, and only exists

I forgot to say that I've also tried transpiling to C from my language.

That makes some things simpler (I don't need to write the backend! And I
get optimisation for free), but C is a poor fit for my language. So
programs that need to be transpiled to C can only use a restricted,
crippled set of features.

My HLL example produces this C:

static void $t$f(void) {
i64 r;
i64 a;
i64 b;
i64 c;
r = (a + (b * c));
}

The advantage of my ILs is that they don't have the restrictions of C.
For example, they will translate this:

(c | a | b) := 0 # (assign 0 to either a or b)

with no problem. The C transpiler will produce this:

(!!(c) ? a : b) = (i64)0;

But this is not legal C.

Thiago Adams

2024-11-27 12:38:41 UTC

Post by Bart
I also use ILs for my compilers, but I write my own backends. I've
worked on two diifferent kinds. One looks like a HLL, and only exists

I forgot to say that I've also tried transpiling to C from my language.
That makes some things simpler (I don't need to write the backend! And I
get optimisation for free), but C is a poor fit for my language. So
programs that need to be transpiled to C can only use a restricted,
crippled set of features.
   static void $t$f(void) {
        i64 r;
        i64 a;
        i64 b;
        i64 c;
        r = (a + (b * c));
   }
The advantage of my ILs is that they don't have the restrictions of C.
    (c | a | b) := 0         # (assign 0 to either a or b)
    (!!(c) ? a : b) = (i64)0;
But this is not legal C.

But you can write it in a different way, can't you?
This raises the question what cannot be done in C?

Thiago Adams

2024-11-27 12:57:45 UTC

Post by Bart
I also use ILs for my compilers, but I write my own backends. I've
worked on two diifferent kinds. One looks like a HLL, and only exists

I was wondering if is possible to write C programs without struct/union?

I did this experiment.

struct X {
int a, b;
};

void F1() {
struct X x;
x.a = 1;
x.b = 2;
printf("%d, %d", x.a, x.b);
}

The equivalent C89 program in a subset without structs count be

#define M(T, obj, OFF) *((T*)(((char*)&(obj)) + (OFF)))

void F2() {
char x[8];
M(int, x, 0 /*offset of a*/) = 1;
M(int, x, 4 /*offset of b*/) = 2;
printf("\n");
printf("%d, %d", M(int, x, 0), M(int, x, 4));
}

The char array represents the struct X memory, then we have to find the
offset of the members and cast to their types.

Does your IL have structs?

The QBE IL has aggregates types. I think this removes the front end
calculate the the offsets.

https://c9x.me/compile/doc/il.html#Aggregate-Types

Thiago Adams

2024-11-27 13:27:30 UTC

Post by Bart
I also use ILs for my compilers, but I write my own backends. I've
worked on two diifferent kinds. One looks like a HLL, and only

I was wondering if is possible to write C programs without struct/union?
I did this experiment.
struct X {
    int a, b;
};
void F1() {
    struct X x;
    x.a = 1;
    x.b = 2;
    printf("%d, %d", x.a, x.b);
}
The equivalent C89 program in a subset without structs count be
#define M(T, obj, OFF) *((T*)(((char*)&(obj)) + (OFF)))
void F2() {
    char x[8];
    M(int, x, 0 /*offset of a*/) = 1;
    M(int, x, 4 /*offset of b*/) = 2;
    printf("\n");
    printf("%d, %d", M(int, x, 0), M(int, x, 4));
}
The char array represents the struct X memory, then we have to find the
offset of the members and cast to their types.
Does your IL have structs?
The QBE IL has aggregates types. I think this removes the front end
calculate the the offsets.
https://c9x.me/compile/doc/il.html#Aggregate-Types

I tried this sample with clang and -S -emit-llvm to see if it generates
structs. The answer is yes.

https://llvm.org/docs/LangRef.html#getelementptr-instruction

struct RT {
char A;
int B[10][20];
char C;
};
struct ST {
int X;
double Y;
struct RT Z;
};

int *foo(struct ST *s) {
return &s[1].Z.B[5][13];
}

The LLVM code generated by Clang is approximately:

%struct.RT = type { i8, [10 x [20 x i32]], i8 }
%struct.ST = type { i32, double, %struct.RT }

define ptr @foo(ptr %s) {
entry:
%arrayidx = getelementptr inbounds %struct.ST, ptr %s, i64 1, i32 2,
i32 1, i64 5, i64 13
ret ptr %arrayidx
}

It is not much different than C in this aspect.

Bart

2024-11-27 15:12:39 UTC

Post by Bart
I also use ILs for my compilers, but I write my own backends. I've
worked on two diifferent kinds. One looks like a HLL, and only

I was wondering if is possible to write C programs without struct/union?
I did this experiment.
struct X {
     int a, b;
};
void F1() {
     struct X x;
     x.a = 1;
     x.b = 2;
     printf("%d, %d", x.a, x.b);
}
The equivalent C89 program in a subset without structs count be
#define M(T, obj, OFF) *((T*)(((char*)&(obj)) + (OFF)))
void F2() {
     char x[8];
     M(int, x, 0 /*offset of a*/) = 1;
     M(int, x, 4 /*offset of b*/) = 2;
     printf("\n");
     printf("%d, %d", M(int, x, 0), M(int, x, 4));
}
The char array represents the struct X memory, then we have to find
the offset of the members and cast to their types.
Does your IL have structs?

No. It has a 'block' type which defines a fixed-length memory block. So
your struct ST type below I think would be represented as the type
'mem:824', as it is 824 bytes.

(This works fine for WinABI. But for SYS V ABI, that has a much more
complex set of rules where struct passing may depend on the types of the
members. They may be split up amongst different registers.

I'm not too worried about that however; it will only apply to structs
passed by value across an FFI, and most external libraries don't pass
by-value structs. There will also be workarounds.)

Post by Thiago Adams
The QBE IL has aggregates types. I think this removes the front end
calculate the the offsets.
https://c9x.me/compile/doc/il.html#Aggregate-Types

I tried this sample with clang and -S -emit-llvm to see if it generates
structs. The answer is yes.
https://llvm.org/docs/LangRef.html#getelementptr-instruction
struct RT {
char A;
int B[10][20];
char C;
};
struct ST {
int X;
double Y;
struct RT Z;
};
int *foo(struct ST *s) {
return &s[1].Z.B[5][13];
}
%struct.RT = type { i8, [10 x [20 x i32]], i8 }
%struct.ST = type { i32, double, %struct.RT }
%arrayidx = getelementptr inbounds %struct.ST, ptr %s, i64 1, i32 2,
i32 1, i64 5, i64 13
ret ptr %arrayidx
}

This example is misleading. That's the output from using -O3.
Unoptimised LLVM output is this:

---------------------------------
define dso_local ptr @foo(ptr noundef %0) #0 !dbg !10 {
%2 = alloca ptr, align 8
store ptr %0, ptr %2, align 8
#dbg_declare(ptr %2, !34, !DIExpression(), !35)
%3 = load ptr, ptr %2, align 8, !dbg !36
%4 = getelementptr inbounds %struct.ST, ptr %3, i64 1, !dbg !36
%5 = getelementptr inbounds nuw %struct.ST, ptr %4, i32 0, i32 2,
!dbg !37
%6 = getelementptr inbounds nuw %struct.RT, ptr %5, i32 0, i32 1,
!dbg !38
%7 = getelementptr inbounds [10 x [20 x i32]], ptr %6, i64 0, i64 5,
!dbg !36
%8 = getelementptr inbounds [20 x i32], ptr %7, i64 0, i64 13, !dbg !36
ret ptr %8, !dbg !39
}
---------------------------------

If you are writing the IR code, then it will be up to you to combine
that chain of constant offsets into a single offset. Othewise it will
still be you needing to do so the other side of the IR!

(I don't know if the reduction above is done pre-LLVM or by LLVM.

In my IL, it will generate multiple instructions, and there will be a
reduction pass, to combine instructions where possible. That's a WIP,
but such examples like yours are incredibly rare in my code-base, while
the speed-up achieved is likely to be minor. Modern CPUs are good at
running poor code fast. Mostly this just makes code more compact.

My IL for your example (I translated to my language) starts off as this:

---------------------
proc t.foo:
param u64 s
rettype u64
load u64 s
load i64 1
addpx mem:824 /824/-824 # /scale factor /extra byte offset
load i64 20
addpx u64 /1
load i64 5
addpx mem:80 /80
load i64 13
addpx i32 /4
jumpret u64 #1
#1:
retfn u64
endproc
---------------------

The reductions could also be applied during codegen to native code. But
as it is, no reductions are done, and the body of the function generates
this x64 code:

mov rax, [rbp + `t.foo.s] # or mov rax, rcx with reg allocator
lea rax, [rax+20]
lea rax, [rax+400]
lea rax, [rax+52]

Here the reduction could also be done with a peephole optimiser to
combined the three LEAs into one instruction. With 's' in a register,
probably the optimum code here would be one LEA instruction.

Bart

2024-11-27 14:33:45 UTC

Post by Bart
I also use ILs for my compilers, but I write my own backends. I've
worked on two diifferent kinds. One looks like a HLL, and only exists

I forgot to say that I've also tried transpiling to C from my language.
That makes some things simpler (I don't need to write the backend! And
I get optimisation for free), but C is a poor fit for my language. So
programs that need to be transpiled to C can only use a restricted,
crippled set of features.
    static void $t$f(void) {
         i64 r;
         i64 a;
         i64 b;
         i64 c;
         r = (a + (b * c));
    }
The advantage of my ILs is that they don't have the restrictions of C.
     (c | a | b) := 0         # (assign 0 to either a or b)
     (!!(c) ? a : b) = (i64)0;
But this is not legal C.

But you can write it in a different way, can't you?
This raises the question what cannot be done in C?

It's just harder. The transformations become harder, if I want to
generated structured C. My syntax is expression-based, for example.

I used also a version called 'linear C' which was flattened C code, but
as I said in my last post, the result is usually a travesty of the
language. And the result absolutely needs an optimising C compiler,
otherwise the code would be even half the speed Tiny C.

Sometimes the C was used for distribution: so people can create binaries
locally, avoiding the AV issues with downloading binaries.

Now, for people using Windows, I'd be likely to provide a .ASM file
instead. For one thing, ASM would support inline ASM sequences in my
programs, which C can't handle.

Thiago Adams

2024-11-27 12:32:45 UTC

Post by Bart
Hard C Features

...

Post by Thiago Adams
I think K&R C is simpler than C89 that is simpler than C99 etc...
My objective if to find the minimum code generator (backend) in C and
leave the other complexities (like warnings, static analysis,
constexpr, preprocessor) to the frond end.
This also facilitates to have more than on backend sharing the job
done by the front end.
For instance, I may move all constant expressions from the generated
code. So the backend does not have to compute constant expressions any
more and it still C89 compatible. I removed enuns for instance.
So my question is not about how to create a simple C compiler but how
to separate and move most of the job to the front end creating a very
simple backend (code generator) which the input is code C89 compatible.
I believe that K&R C is simpler than C89, which in turn is simpler
than C99, and so on.
My goal is to design a minimal code generator (backend) in C reading
C89, while delegating other complexities—such as warnings, static
analysis, constexpr, and preprocessing—to the front end.
This approach also facilitates using multiple backends that share the
work handled by the front end.
For example, I might remove all constant expressions from the
generated code, so the backend no longer needs to compute them and
remains C89-compatible. I've already removed features like enums,
typedefs for instance.
Therefore, my question isn't about how to create a simple C compiler.
Instead, it's about how to shift most of the workload to the front
end, resulting in a very simple backend (code generator) that
processes C89-compatible code as input.
Does it makes sense?

Not really. You're basically talking about using an IR or IL, which most
compilers already do, including mine now. Clang for example uses LLVM IR.

I am talking about using a subset of C89 as IL.

Post by Bart
Some languages use C as intermediate language.
But you seem to be getting C, and lower level ILs, mixed up.
If you are transpiling to C, then just generate C code, C89 if you like.
In that case you don't need to discard 90% of the language to make it
simpler! Simpler for whom?

C89 compilers that can deal with function

Post by Bart
prototypes etc already exist; you said you are not writing your own
compiler.

Post by Thiago Adams
I was wondering if splitting expressions would make the backend simpler
for instance
int r = a + b * c;
converted to
int r1 = b * c;
int r2 = a + r1;
int r = r2;

You are still talking as though YOU are writing the backend! You will
either use an existing C compiler or an existing IL backend, but there
aren't that many of the latter.

But I want to write the backend!
I want to have a simple backend (C compiler that compiles a subset of
C89) and complex front end (C99 C11 C23 ..C26)
At same time I can use existing C compilers.

Post by Bart
The most famous is LLVM, but it is fantastically complex, huge and slow.
(To see examples of LLVM IR, go to godbolt.org, choose C language,
choose a Clang compiler, and enter '-S -emit-llvm' as the compiler
options. Then try an example C function in the left panel.)
I also use ILs for my compilers, but I write my own backends. I've
worked on two diifferent kinds. One looks like a HLL, and only exists
proc F=
      int r, a, b, c
      r := a + b*c
end
    i64 r
    i64 a
    i64 b
    i64 c
!------------------------
    T1 := b * c         i64
    T2 := a + T1        i64
    r := T2             i64
!------------------------
    retproc
End
It looks great, but was hard to work with. Instead I settled on this
lower level IL, which looks like assembly. That one works also with C,
void F() {
      int r, a, b, c;
      r = a + b*c;
}
      local    i32       r.1
      local    i32       a.1
      local    i32       b.1
      local    i32       c.1
!------------------------
      load     i32       a.1              ! 00005
      load     i32       b.1              ! 00006
      load     i32       c.1              ! 00007
      mul      i32                        ! 00008
      add      i32                        ! 00009
      store    i32       r.1              ! 00010
!------------------------
      retproc                             ! 00013
endproc
(The '::' indicates an exported function, as 'static' was not used.
The .1 suffixes are to do with block scopes, since there can be multiple
'a' identifiers in a function.)
No function prototypes are needed, since everything needed is specified
at the call-site. The front-end will provide an necesary conversions or
promotions. There are attributes that appear to mark variadic calls for
example.
Only imported functions need to be listed.
This sounds vaguely like what you are trying to achieve, but you have
the idea that this IL must be C.
C however will generally need that extra info (function signatures etc)
or it will cause problems. But it is very little trouble to provide them.

I think this process will give me time.
First generating subset C89, then creating a compiler for this subset.
Then, with this experience I can think of how to create my IL or how to
include something to this C89 subset or how to make it even more simplified.

Bart

2024-11-27 14:25:53 UTC

Post by Bart
Hard C Features

...

Post by Thiago Adams
I think K&R C is simpler than C89 that is simpler than C99 etc...
My objective if to find the minimum code generator (backend) in C
and leave the other complexities (like warnings, static analysis,
constexpr, preprocessor) to the frond end.
This also facilitates to have more than on backend sharing the job
done by the front end.
For instance, I may move all constant expressions from the generated
code. So the backend does not have to compute constant expressions
any more and it still C89 compatible. I removed enuns for instance.
So my question is not about how to create a simple C compiler but how
to separate and move most of the job to the front end creating a very
simple backend (code generator) which the input is code C89 compatible.
I believe that K&R C is simpler than C89, which in turn is simpler
than C99, and so on.
My goal is to design a minimal code generator (backend) in C reading
C89, while delegating other complexities—such as warnings, static
analysis, constexpr, and preprocessing—to the front end.
This approach also facilitates using multiple backends that share the
work handled by the front end.
For example, I might remove all constant expressions from the
generated code, so the backend no longer needs to compute them and
remains C89-compatible. I've already removed features like enums,
typedefs for instance.
Therefore, my question isn't about how to create a simple C compiler.
Instead, it's about how to shift most of the workload to the front
end, resulting in a very simple backend (code generator) that
processes C89-compatible code as input.
Does it makes sense?

Not really. You're basically talking about using an IR or IL, which
most compilers already do, including mine now. Clang for example uses
LLVM IR.

I am talking about using a subset of C89 as IL.

Post by Bart
Some languages use C as intermediate language.
But you seem to be getting C, and lower level ILs, mixed up.
If you are transpiling to C, then just generate C code, C89 if you
like. In that case you don't need to discard 90% of the language to
make it simpler! Simpler for whom?

C89 compilers that can deal with function

Post by Bart
prototypes etc already exist; you said you are not writing your own
compiler.
> I was wondering if splitting expressions would make the backend
simpler
>
> for instance
>
> int r = a + b * c;
>
> converted to
>
> int r1 = b * c;
> int r2 = a + r1;
> int r = r2;
You are still talking as though YOU are writing the backend! You will
either use an existing C compiler or an existing IL backend, but there
aren't that many of the latter.

But I want to write the backend!

OK! Finally that is cleared up.

Post by Thiago Adams
I want to have a simple backend (C compiler that compiles a subset of
C89) and complex front end (C99 C11 C23 ..C26)
At same time I can use existing C compilers.

So the requirements now are different. You want to use the simplest
possible subset of C, that is still valid C and can be processed and
optimised etc by any compiler. But which at a later point you will try
to implement it yourself.

There are actually even smaller and cruder subsets of C that can be
used, some of which I have tried. But the generated code is dreadful,
and hard to optimise. For example I could turn my stack-based IL code
into C, an instruction at a time.

A lot of types can disappear, such as floats and doubles; they appear
only as casts.

But this would be going too far.

Thiago Adams

2024-11-27 16:06:51 UTC

Post by Bart
Hard C Features

...

Post by Thiago Adams
I think K&R C is simpler than C89 that is simpler than C99 etc...
My objective if to find the minimum code generator (backend) in C
and leave the other complexities (like warnings, static analysis,
constexpr, preprocessor) to the frond end.
This also facilitates to have more than on backend sharing the job
done by the front end.
For instance, I may move all constant expressions from the generated
code. So the backend does not have to compute constant expressions
any more and it still C89 compatible. I removed enuns for instance.
So my question is not about how to create a simple C compiler but
how to separate and move most of the job to the front end creating a
very simple backend (code generator) which the input is code C89
compatible.
I believe that K&R C is simpler than C89, which in turn is simpler
than C99, and so on.
My goal is to design a minimal code generator (backend) in C reading
C89, while delegating other complexities—such as warnings, static
analysis, constexpr, and preprocessing—to the front end.
This approach also facilitates using multiple backends that share
the work handled by the front end.
For example, I might remove all constant expressions from the
generated code, so the backend no longer needs to compute them and
remains C89-compatible. I've already removed features like enums,
typedefs for instance.
Therefore, my question isn't about how to create a simple C compiler.
Instead, it's about how to shift most of the workload to the front
end, resulting in a very simple backend (code generator) that
processes C89-compatible code as input.
Does it makes sense?

Not really. You're basically talking about using an IR or IL, which
most compilers already do, including mine now. Clang for example uses
LLVM IR.

I am talking about using a subset of C89 as IL.

Post by Bart
Some languages use C as intermediate language.
But you seem to be getting C, and lower level ILs, mixed up.
If you are transpiling to C, then just generate C code, C89 if you
like. In that case you don't need to discard 90% of the language to
make it simpler! Simpler for whom?

C89 compilers that can deal with function

But I want to write the backend!

OK! Finally that is cleared up.

Post by Thiago Adams
I want to have a simple backend (C compiler that compiles a subset of
C89) and complex front end (C99 C11 C23 ..C26)
At same time I can use existing C compilers.

Exactly.

Tim Rentsch

2024-11-28 05:06:50 UTC

Post by Thiago Adams
Do you have any idea what else can be simplified when creating a C compiler?

What are you asking; are you thinking of writing one? Because C
compilers already exist!
If so, think of what you would find troublesome. I could create a
long list of things that makes C harder to compile than my own
language.

Hard C Features
[...]

Most of the items on your list are not really very hard. Some
are rather tedious, and some are something of a pain in the ass,
but that doesn't make implementing them hard; just tedious.

Thiago Adams

2024-11-26 19:11:45 UTC

Another question is.. does the compiler cares about function type when
calling a function or this is just an information to avoid programmers
mistakes?

Consider this code:

int main() {
strcmp("a", "b");
}

It compiles in -std=c89

Now changing to -std=c99 -std=c11 it gives:

error: implicit declaration of function 'strcmp'

Then adding:

int strcmp();

int main() {
strcmp("a", "b");
}

it works in C99 / C11

I think in C23 empty parameter list means no args, while in the previous
versions (void) means no args.

Considering that in previous versions of C we could call a function
without its signature I think the compiler only needs the caller side.
(of course I am not considering programmer mistakes)

So, I think one extra simplification for small compilers is to ignore
function parameters.

Bart

2024-11-26 19:25:38 UTC

Another question is.. does the compiler cares about function type when
calling a function or this is just an information to avoid programmers
mistakes?

Yes.

It will need to know about types anyway so that it can generate the
correct code.

While for function calls, different types may be passed in different
registers.

This is less critical for 32-bit code than for 64-bit, but presumably
you will want your C89 code to be compiled to 64-bit code on 64-bit
machines?

Post by Thiago Adams
int main() {
strcmp("a", "b");
}
It compiles in -std=c89
error: implicit declaration of function 'strcmp'
int strcmp();
int main() {
strcmp("a", "b");
}
it works in C99 / C11
I think in C23 empty parameter list means no args, while in the previous
versions (void) means no args.
Considering that in previous versions of C we could call a function
without its signature I think the compiler only needs the caller side.
(of course I am not considering programmer mistakes)
So, I think one extra simplification for small compilers is to ignore
function parameters.

I don't think so. But you are welcome to look at godbolt.org and see for
yourself. Try this for example:

void F(double);
void G(int);

void H(void) {
F(0);
G(0);
}

Thiago Adams

2024-11-26 19:42:34 UTC

Another question is.. does the compiler cares about function type when
calling a function or this is just an information to avoid programmers
mistakes?

Yes.
It will need to know about types anyway so that it can generate the
correct code.
While for function calls, different types may be passed in different
registers.
This is less critical for 32-bit code than for 64-bit, but presumably
you will want your C89 code to be compiled to 64-bit code on 64-bit
machines?

Post by Thiago Adams
int main() {
strcmp("a", "b");
}
It compiles in -std=c89
error: implicit declaration of function 'strcmp'
int strcmp();
int main() {
strcmp("a", "b");
}
it works in C99 / C11
I think in C23 empty parameter list means no args, while in the
previous versions (void) means no args.
Considering that in previous versions of C we could call a function
without its signature I think the compiler only needs the caller side.
(of course I am not considering programmer mistakes)
So, I think one extra simplification for small compilers is to ignore
function parameters.

I don't think so. But you are welcome to look at godbolt.org and see for

Yes..I realized now I am wrong. Considering function calls uses
registers I think the old C model works only when passing everything on
stack.

BGB

2024-11-26 23:37:29 UTC

Another question is.. does the compiler cares about function type
when calling a function or this is just an information to avoid
programmers mistakes?

Yes.
It will need to know about types anyway so that it can generate the
correct code.
While for function calls, different types may be passed in different
registers.
This is less critical for 32-bit code than for 64-bit, but presumably
you will want your C89 code to be compiled to 64-bit code on 64-bit
machines?

Post by Thiago Adams
int main() {
strcmp("a", "b");
}
It compiles in -std=c89
error: implicit declaration of function 'strcmp'
int strcmp();
int main() {
strcmp("a", "b");
}
it works in C99 / C11
I think in C23 empty parameter list means no args, while in the
previous versions (void) means no args.
Considering that in previous versions of C we could call a function
without its signature I think the compiler only needs the caller
side. (of course I am not considering programmer mistakes)
So, I think one extra simplification for small compilers is to ignore
function parameters.

I don't think so. But you are welcome to look at godbolt.org and see

Yes..I realized now I am wrong. Considering function calls uses
registers I think the old C model works only when passing everything on
stack.

Can still sort of work:
If spill space is provided for any register arguments;
The mapping of arguments to registers does not depend on argument type;
Passing arguments auto-promotes to a consistent representation of the
type (say, for example, 'float' auto-promotes to 'double', and 'int'
auto-promotes to 'long', ...).

In this case, one can spill all of the register arguments into the spill
space and then be back to an old linear memory argument list.

...

Bart

2024-11-27 01:11:04 UTC

Post by BGB

Post by Bart
I don't think so. But you are welcome to look at godbolt.org and see

Yes..I realized now I am wrong. Considering function calls uses
registers I think the old C model works only when passing everything
on stack.

If spill space is provided for any register arguments;
The mapping of arguments to registers does not depend on argument type;
Passing arguments auto-promotes to a consistent representation of the
type (say, for example, 'float' auto-promotes to 'double', and 'int'
auto-promotes to 'long', ...).

That's still full of problems, unless you impose a stricter type system
in what's left of the language (eg. revert to B or even BCPL).

Since it needs to know whether an 8-bit 255 value is sign- or
zero-extended to 32 or 64 bits.

Or whether an int value needs to be converted to float or vice versa.

There's also the question of passing structs by value.

David Brown

2024-11-27 09:53:43 UTC

Post by Thiago Adams
int strcmp();
int main() {
strcmp("a", "b");
}
it works in C99 / C11
I think in C23 empty parameter list means no args, while in the
previous versions (void) means no args.
Considering that in previous versions of C we could call a function
without its signature I think the compiler only needs the caller
side. (of course I am not considering programmer mistakes)
So, I think one extra simplification for small compilers is to ignore
function parameters.

I don't think so. But you are welcome to look at godbolt.org and see

Yes..I realized now I am wrong. Considering function calls uses
registers I think the old C model works only when passing everything on
stack.

No, it should work for other calling conventions too. Passing
everything on the stack has not been common practice for many decades,
for most processor architectures.

What you have to consider here is the "default argument promotions". If
a function is defined to take a "double", and you call it using an int
expression without using a function prototype, the result is UB (in a
real and practical sense, not just hypothetically). It doesn't matter
if arguments are passed on the stack or registers, or if you have 32-bit
or 64-bit or any other size of cpu (I don't know why Bart thinks that
matters). It is still a disaster.

If you want to write or generate code that calls a function, you need to
know /exactly/ what type the parameters are. And you need to call it
with parameters of those types. You can do that by having a function
prototype and letting the compiler make the appropriate implicit
conversions (assuming they are allowed by the language), or you can
manually add any required conversions (such as casts) before the call,
or you can rely on the default argument promotions if you know the
result will be the correct type.

There is - to my knowledge - never a good reason for omitting a function
prototype. Implicit function declaration was IMHO one of the biggest
design flaws in pre-standard C, and allowing it to continue in C90 after
prototypes were added to the language, was a serious mistake. Compilers
should complain loudly if you try to call a function without a prototype
declaration. (I believe Bart's compiler treats it as a fatal error - it
is a non-conformity of which I approve.) And finally in C23 - some
thirty years late - the standard finally requires proper prototypes.

I am not all sure why you are generating code C90 code here. I don't
think anyone much cares about using strict C90 other than a couple of
people in this newsgroup. (People sometimes do have to limit their C to
a subset accepted by an old or limited compiler, but usually they then
also want to use extensions or post-C90 features that their compiler
supports. And are they going to use your tool?) But presumably you
know more about your potential users than I do. However, it seems to me
that you should not be considering generating code that would be
rejected by at least some C compilers - including all C23 compilers.

Thiago Adams

2024-11-27 10:52:51 UTC

I don't think so. But you are welcome to look at godbolt.org and see

Yes..I realized now I am wrong. Considering function calls uses
registers I think the old C model works only when passing everything
on stack.

Thanks for the comments. Very useful.

There is - to my knowledge - never a good reason for omitting a function
prototype. Implicit function declaration was IMHO one of the biggest
design flaws in pre-standard C, and allowing it to continue in C90 after
prototypes were added to the language, was a serious mistake. Compilers
should complain loudly if you try to call a function without a prototype
declaration. (I believe Bart's compiler treats it as a fatal error - it
is a non-conformity of which I approve.) And finally in C23 - some
thirty years late - the standard finally requires proper prototypes.
I am not all sure why you are generating code C90 code here. I don't
think anyone much cares about using strict C90 other than a couple of
people in this newsgroup.

The objective is to leave all complications to a front end, and write a
simpler C89 code that is used by the backend.

The advantage, comparing with a custom IL, is that the code will works
in any C compiler.

David Brown

2024-11-27 15:07:02 UTC

Post by Thiago Adams
Thanks for the comments. Very useful.

If they are of help to you, then that's great. It is not always easy to
know what will be useful.

I am not all sure why you are generating code C90 code here. I don't
think anyone much cares about using strict C90 other than a couple of
people in this newsgroup.

The objective is to leave all complications to a front end, and write a
simpler C89 code that is used by the backend.
The advantage, comparing with a custom IL, is that the code will works
in any C compiler.

I think if we can understand what you are trying to do, and why, it will
be easier to help. (I say "we" - maybe I'm the only one that doesn't
understand. But hopefully I'm not the only one that is happy to give
comments that might be useful!)

Are you trying to make your own C compiler here, divided into two parts?
If so, why? What's the motivation - what will make this compiler
different from others? As an IL, this limited C would be very poor in
terms of good code generation, static error checking, or debugging. It
could be fairly portable, but I think would lose the platform or
compiler specific features that are often useful for real programs and
libraries.

(Of course, fun, learning and your own interest is always perfectly good
motivation as far as I am concerned.)

Thiago Adams

2024-11-27 16:20:02 UTC

Post by Thiago Adams
Thanks for the comments. Very useful.

If they are of help to you, then that's great. It is not always easy to
know what will be useful.

I am not all sure why you are generating code C90 code here. I don't
think anyone much cares about using strict C90 other than a couple of
people in this newsgroup.

The objective is to leave all complications to a front end, and write
a simpler C89 code that is used by the backend.
The advantage, comparing with a custom IL, is that the code will works
in any C compiler.

Yes. I already have experience in front end, and no experience in backend.

If so, why? What's the motivation - what will make this compiler
different from others?

I hope to have a first class front end, with static analysis.

My expectation for the backend are low at this point, it is more to
understand better how everything works.

So here is the advantage of using C as output (as IL). It can be
compiled by any C compiler.

As an IL, this limited C would be very poor in
terms of good code generation, static error checking, or debugging.

I think for debugger it is will good. We can see the generated code, we
can have break points etc.

It
could be fairly portable, but I think would lose the platform or
compiler specific features that are often useful for real programs and
libraries.

This output will be for "direct consumption" my compiler -> C89 -> CC
It is header, platform and settings dependent since it is preprocessed.

But my front end also have the non-preprocessed mode -
http://thradams.com/cake/playground.html

The problem with the other mode is that it functions as code editing.
The process of generating this 'edit' is completely different from
generating code for an intermediate language (IL). So, I am creating a
new mode where I generate code instead of editing it. I believe this
approach is similar to what a typical compiler does.

(Of course, fun, learning and your own interest is always perfectly good
motivation as far as I am concerned.)

This is the backend motivation!
The front end is to have fun and to use it in my own C code as static
analyzer and I hope more people could use it.

Michael S

2024-11-27 10:36:16 UTC

On Tue, 26 Nov 2024 16:42:34 -0300

Post by Thiago Adams
Yes..I realized now I am wrong. Considering function calls uses
registers I think the old C model works only when passing everything
on stack.

"Old model" relies on programmer always using right types in the
function call. F(0) call Bart's example would not work even for calling
conventions in which both int and double passed on the same stack,
because [in typical pre-64-bit calling conventions] they don't occupy
the same space. For correct result you would have to write it as
F((double)0) or F(0.0).

Alternatively "old model" could work when all things that are allowed
to be passed as function parameters are of the same size. It seems,
that's what they had in ancestors of C language and probably in very
early versions of C as well. It was no longer a case in variant of the
language described by 1st edition of K&R.

Thiago Adams

2024-11-27 11:08:34 UTC

Post by Michael S
On Tue, 26 Nov 2024 16:42:34 -0300

Post by Thiago Adams
Yes..I realized now I am wrong. Considering function calls uses
registers I think the old C model works only when passing everything
on stack.

"Old model" relies on programmer always using right types in the
function call. F(0) call Bart's example would not work even for calling
conventions in which both int and double passed on the same stack,
because [in typical pre-64-bit calling conventions] they don't occupy
the same space. For correct result you would have to write it as
F((double)0) or F(0.0).
Alternatively "old model" could work when all things that are allowed
to be passed as function parameters are of the same size. It seems,
that's what they had in ancestors of C language and probably in very
early versions of C as well. It was no longer a case in variant of the
language described by 1st edition of K&R.

I will write in my own words. Correct me if I make a mistake.

Without function prototypes, the compiler will use the types it has on
the caller's side, possibly with integer promotions.

Calling a function with a double will assume the function is implemented
receiving a double.
The function implementation will need to match these types.

With function prototypes, we can call a f(int i) with f(1.1) and then
the caller side will convert before calling f.

Michael S

2024-11-27 13:01:39 UTC

On Wed, 27 Nov 2024 08:08:34 -0300

Post by Michael S
On Tue, 26 Nov 2024 16:42:34 -0300

Post by Thiago Adams
Yes..I realized now I am wrong. Considering function calls uses
registers I think the old C model works only when passing
everything on stack.

"Old model" relies on programmer always using right types in the
function call. F(0) call Bart's example would not work even for
calling conventions in which both int and double passed on the same
stack, because [in typical pre-64-bit calling conventions] they
don't occupy the same space. For correct result you would have to
write it as F((double)0) or F(0.0).
Alternatively "old model" could work when all things that are
allowed to be passed as function parameters are of the same size.
It seems, that's what they had in ancestors of C language and
probably in very early versions of C as well. It was no longer a
case in variant of the language described by 1st edition of K&R.

I will write in my own words. Correct me if I make a mistake.
Without function prototypes, the compiler will use the types it has
on the caller's side, possibly with integer promotions.
Calling a function with a double will assume the function is
implemented receiving a double.
The function implementation will need to match these types.
With function prototypes, we can call a f(int i) with f(1.1) and then
the caller side will convert before calling f.

Yes. With one more complication that there was floating-point promotion
as well as integer promotion. IIRC, before they invented prototypes it
was impossible to write functions with 'float' parameters.

David Brown

2024-11-27 15:12:25 UTC

Post by Michael S
On Wed, 27 Nov 2024 08:08:34 -0300

Post by Michael S
On Tue, 26 Nov 2024 16:42:34 -0300

Post by Thiago Adams
Yes..I realized now I am wrong. Considering function calls uses
registers I think the old C model works only when passing
everything on stack.

"Old model" relies on programmer always using right types in the
function call. F(0) call Bart's example would not work even for
calling conventions in which both int and double passed on the same
stack, because [in typical pre-64-bit calling conventions] they
don't occupy the same space. For correct result you would have to
write it as F((double)0) or F(0.0).
Alternatively "old model" could work when all things that are
allowed to be passed as function parameters are of the same size.
It seems, that's what they had in ancestors of C language and
probably in very early versions of C as well. It was no longer a
case in variant of the language described by 1st edition of K&R.

I will write in my own words. Correct me if I make a mistake.
Without function prototypes, the compiler will use the types it has
on the caller's side, possibly with integer promotions.
Calling a function with a double will assume the function is
implemented receiving a double.
The function implementation will need to match these types.
With function prototypes, we can call a f(int i) with f(1.1) and then
the caller side will convert before calling f.

Before prototypes were added to C, I am not sure C was standardised
enough to even ask about such functions! But AFAIK it is not possible
to call a function with float arguments unless you use prototype
declarations, even with modern C.

The term you are looking for here is "default argument promotions". You
get standard integer promotions on integer types of lower rank than
"int", and floats are promoted to doubles.

Tim Rentsch

2024-11-27 23:18:33 UTC

Post by Michael S
On Wed, 27 Nov 2024 08:08:34 -0300

Post by Thiago Adams
Without function prototypes, the compiler will use the types it
has on the caller's side, possibly with integer promotions.
Calling a function with a double will assume the function is
implemented receiving a double.
The function implementation will need to match these types.
With function prototypes, we can call a f(int i) with f(1.1) and
then the caller side will convert before calling f.

Yes. With one more complication that there was floating-point
promotion as well as integer promotion. IIRC, before they
invented prototypes it was impossible to write functions with
'float' parameters.

I believe the last statement there isn't exactly right. If we
have a K&R-style function definition, as for example

int
not_too_small( f )
float f;
{
return f > 1.e-250;
}

that declares a parameter 'f' with type float, then as far as the
body of the function is concerned the type of 'f' is float. It is
of course true that, as far as callers are concerned, the function
expects a double argument, but that isn't the same as a double
parameter. Consider this similar function:

int
not_too_small_two( f )
double f;
{
return f > 1.e-250;
}

These two function do not have the same behavior. The reason for
that difference is that in one case any argument value is left as
is (as a double), and is treated as such, and in the other case
any argument value is converted (inside the function body) from a
double to a float, and is treated within the function body as a
float object, not as a double object. (Calling with 1.e-200 as
an argument should show one case of different behaviors.)

Bart

2024-11-26 19:18:13 UTC

I don't use them in generated code either. (Only in a brief section at
the top to define my prefered type designations.)

I don't use #include either, not even for standard headers, although gcc
doesn't like it when I define my own std library functions. There are
ways to shut it up though.

Post by Thiago Adams
I am also generating the structs as required (on demand). For the
structs I am renaming it because I am generating all structs at global
scope.
Does the compiler/linkers? Cares If I am lying about const? Or If rename
the structs?
I think it does not care, and it seems to work (it compiles and run).

I don't know why the linker would care about anything. All it sees are
symbol imports and exports.

A compiler might care about lack of 'const' in that it could stop it
doing doing some optimisations.

But it can't report a type mismatch between 'const' and non-'const'
types if 'const' has been banished completely.

Thiago Adams

2024-11-26 19:38:23 UTC

I don't use them in generated code either. (Only in a brief section at
the top to define my prefered type designations.)

I am generating the prototypes for all functions called.
No includes, no macros.

The generated code is depending on compiler flags , platform and
headers. It is intended to direct usage like in the pipeline

use my_compiler -> C89 -> CC

I don't use #include either, not even for standard headers, although gcc
doesn't like it when I define my own std library functions. There are
ways to shut it up though.

Post by Thiago Adams
I am also generating the structs as required (on demand). For the
structs I am renaming it because I am generating all structs at global
scope.
Does the compiler/linkers? Cares If I am lying about const? Or If
rename the structs?
I think it does not care, and it seems to work (it compiles and run).

I think GCC has some builtin functions and he can complain if the
function prototype differs.

Do you have any idea what else can be simplified when creating a C compiler?

I was thinking about literal strings.

Instead of

f("abc");

Generating something like

char literal_string_1[] = {'a', 'b', 'c', '\0' }; //or numbers global

f(literal_string_1);

The reason I believe compilers already have to do this right? (put
strings in a data section)

So this was one extra simplification I was thinking about.
Also remove loops (for , while) and switches.

I was considering to remove sizeof but I think this will be part
(together with structs and align) of any compiler even simpler ones.

When thinking about this, I consider the first version of C, K&R was
already very complete in terms of code generation.

The evolution of C is much more about facilities like enuns, compound
literals than something that is created to code generation.

Waldek Hebisch

2024-11-27 10:29:55 UTC

Post by Thiago Adams
(I think I know the answer but I would like to learn more.)
I am using C89 as "compiler backend intermediate language".
I want a very simple output that could facilitate the construction of a
simple C89 compiler focused on code generation.
I am removing these features from the generated code.
- typedef
- enum
- preprocessor
- const
At this output, I am generating the prototypes for the functions I call.
For instance,
int strcmp( const char* lhs, const char* rhs );
is generated as
int strcmp( char* lhs, char* rhs );
I am also generating the structs as required (on demand). For the
structs I am renaming it because I am generating all structs at global
scope.
Does the compiler/linkers? Cares If I am lying about const? Or If rename
the structs?

1) campilers for embedded targets care very much about const. const
qualified arrays go into read-only data section which is typically
located in flash. Other arrays go to RAM. Embedded targets
frequently have very small RAM and larger flash, so after
dropping const program may no longer fit in available RAM.
2) Linkers for non-standard formats may do whatever they please. In
particular compiler could emit type information and linker may
check it.
3) If you want later C, then C89 as intermediate format will not
work, basically it is hard to implement VLA-s without VLA-s
in target language. In principle you can replace VMT-s by
pointer arithmetic, but then you have nontrivial chunk of
code generator inside your front end.

--
Waldek Hebisch

Thiago Adams

2024-11-27 11:03:47 UTC

Post by Waldek Hebisch

I think your comment applies for const in declarations like

const int i = 1;

I used to find const confusing, as it sometimes meant 'read-only' and
other times 'immutable.'

Now, it seems less confusing to me. When const is used with variables
that can be initialized (init-declarator), it acts as 'immutable',
meaning the storage is constant.

In other contexts, like function parameters, const means 'read-only'
because we don’t know if the storage is constant or not.

From the backend perspective this const used as 'immutable' maybe is
useful. The other one "ready only" is that I think can be ignored by
this simpler C89, in case it does not cause linker problems.

Post by Waldek Hebisch
2) Linkers for non-standard formats may do whatever they please. In
particular compiler could emit type information and linker may
check it.
3) If you want later C, then C89 as intermediate format will not
work, basically it is hard to implement VLA-s without VLA-s
in target language. In principle you can replace VMT-s by
pointer arithmetic, but then you have nontrivial chunk of
code generator inside your front end.

I think in some cases generated C code can have limitations. I remember
to read, probably on "the design and evolution of C++" that exceptions
were hard to implement and this was a motivation to not generate C code
anymore at the CFront compiler.

Keith Thompson

2024-11-28 09:25:48 UTC

[...]

Post by Waldek Hebisch
1) campilers for embedded targets care very much about const. const
qualified arrays go into read-only data section which is typically
located in flash. Other arrays go to RAM. Embedded targets
frequently have very small RAM and larger flash, so after
dropping const program may no longer fit in available RAM.

I think your comment applies for const in declarations like
const int i = 1;
I used to find const confusing, as it sometimes meant 'read-only' and
other times 'immutable.'

I'm not sure what you mean. My understanding is that const means
read-only, and nothing else.

Post by Thiago Adams
Now, it seems less confusing to me. When const is used with variables
that can be initialized (init-declarator), it acts as 'immutable',
meaning the storage is constant.

What exactly do you mean by "the storage is constant"? Are you talking
about memory that is marked as read-only by the OS?

Given something like:

const int r = rand();

at block scope, the object will almost certainly be stored in ordinary
read/write memory. The compiler will flag code that attempts to modify
it (unless you play tricks with pointer casts, which can introduce
undefined behavior). But if I do something like `*(int*)&r = 42;`,
it's likely to "work".

Defining an object as const can *enable* a compiler to store it in
read-only memory (enforced by the OS, or maybe even physical RAM on some
systems), but that's an implementation choice, not part of the semantics
of const.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Thiago Adams

2024-11-28 11:19:32 UTC

I think your comment applies for const in declarations like
const int i = 1;
I used to find const confusing, as it sometimes meant 'read-only' and
other times 'immutable.'

I'm not sure what you mean. My understanding is that const means
read-only, and nothing else.

I think my previous comment is not precise; it could be better phrased.
It also have some mistakes about init-declarator.

I will give samples what I was trying to say.

When we have this declaration we are declaring some storage (for the i
variable)

const int i = 0;

But here

void f(const struct X * p);

We are not declaring the storage for the pointed object.

So, for the first case, we can think const as declaring a immutable
storage, while for the second sample const acts as "read-only" - we
don't know if the storage is const or not.

Post by Thiago Adams
Now, it seems less confusing to me. When const is used with variables
that can be initialized (init-declarator), it acts as 'immutable',
meaning the storage is constant.

What exactly do you mean by "the storage is constant"? Are you talking
about memory that is marked as read-only by the OS?

Here comes another point (that I realized after I wrote that) and that
makes const more confusing.

When const is used in a external declaration like

const int i = 1;
int main(){}

We can think about read-only marked memory.

But for local variables it does not make sense to have "read-only marked
memory" because it lives on stack.

int main(){
const int i = 1;
}

Post by Keith Thompson
const int r = rand();
at block scope, the object will almost certainly be stored in ordinary
read/write memory. The compiler will flag code that attempts to modify
it (unless you play tricks with pointer casts, which can introduce
undefined behavior). But if I do something like `*(int*)&r = 42;`,
it's likely to "work".
Defining an object as const can *enable* a compiler to store it in
read-only memory (enforced by the OS, or maybe even physical RAM on some
systems), but that's an implementation choice, not part of the semantics
of const.
[...]

Yes, you have pointed out, what I realized after writing this.Thanks for
paying attention into these details

const is very context dependent, maybe trying to reuse the same keyword,
and I think C23 had a change to clarify it, but instead make it more
confusing with constexpr, that was the point of my previous topic.

For compile that computation what matters is the guarantee that the
compiler knows the values (it knows because it always the same value of
initialization) when using the object. (It does not depend on flow analysis)

I think const, like in here

const int i = 1;

gives the same guarantee. (The compiler knows the value of i)

What I think could be explored more is the usage of register keyword as
meaning "no-storage".

The idea of const no-storage is good because it eliminates any problem
with object lifetime and it makes the perfect constants in my view.
Unfortunately, constexpr does not mean that because we can take the
address of constexpr object.

Sample why no-storage is useful

void F()
{
register const int i = 1;
//lets way we have lanbdas in C
f( []()
{
//safe to use i even in another thread, or even after exiting F
int k = i;
}
);
}

Bart

2024-11-28 11:38:25 UTC

What exactly do you mean by "the storage is constant"? Are you talking
about memory that is marked as read-only by the OS?

Here comes another point (that I realized after I wrote that) and that
makes const more confusing.

I think 'const' is confusing for similar reasons that VLAs can be both
confusing and awkward to implement.

That's because both really apply to /types/, not directly to variables.

So both const and a VLA can specified deep inside a type-spec, where
there may be no storage allocated, inside a cast for example, but here's
a simpler one:

int n;
const int (*A)[n];

This declares a pointer to a VLA, so no storage is allocated for the
VLA. (I'm not even sure how you'd allocate it in the program, given that
VLAs normally go on the stack.)

And the 'const' applies to the array elements, which here don't exist.
To make the pointer itself const, then 'const' needs to be at the
top-level, which bizarrely needs to go not only in the middle, but
/after/ the pointer which is to be const:

const int (*const A)[n];

Keith Thompson

2024-11-28 19:58:36 UTC

Bart <***@freeuk.com> writes:
[...]

Post by Bart
I think 'const' is confusing for similar reasons that VLAs can be both
confusing and awkward to implement.
That's because both really apply to /types/, not directly to variables.

Sure. For example, given

const int n = 42;

n is of type `const int`, and &n is of type `consts int*`. Of course
that implies that n itself is const. I'm not sure what's so confusing
about that. If const applied *directly* to variables, it's hard to see
how something like &n could be treated consistently.

"const" has to apply to types anyway. Are you suggesting that it should
have an additional meaning when applied to variables? What would be the
advantage of that?

Post by Bart
So both const and a VLA can specified deep inside a type-spec, where
there may be no storage allocated, inside a cast for example, but
int n;
const int (*A)[n];
This declares a pointer to a VLA, so no storage is allocated for the
VLA. (I'm not even sure how you'd allocate it in the program, given
that VLAs normally go on the stack.)

It declares *and defines* (allocates storage for) a pointer to a VLA.
You could allocate the array any way you like. For example:

int n = 42;
const int (*A)[n];
const int vla[n];
A = &vla;

(Though it's hard to see how you'd initialize the elements of `vla`.)

Post by Bart
And the 'const' applies to the array elements, which here don't exist.

Right, just as in
const int *ptr;
the const applies to an int object which doesn't yet exist.

Post by Bart
To make the pointer itself const, then 'const' needs to be at
the top-level, which bizarrely needs to go not only in the middle, but
const int (*const A)[n];

Yes, C's declaration syntax can be confusing. I almost certainly
wouldn't have done it that way if I were designing the language
from scratch. But it's unambiguous, it's not going to change,
and complaining about it does no good. (Have you ever considered
trying to help people understand it?)

The issue you mention isn't directly related to "const"; it applies
equally to all type qualifiers (const, restrict, volatile, _Atomic).

cdecl is a good tool for unravelling complex declarations. It doesn't
handle VLAs, but there are workarounds for that :

$ cdecl
Type `help' or `?' for help
cdecl> explain const int (*A)[n]
syntax error
cdecl> explain const int (*A)[42]
declare A as pointer to array 42 of const int

So A is a pointer to array n of const int.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Bart

2024-11-28 22:23:29 UTC

Post by Bart
I think 'const' is confusing for similar reasons that VLAs can be both
confusing and awkward to implement.
That's because both really apply to /types/, not directly to variables.

Sure. For example, given
const int n = 42;
n is of type `const int`, and &n is of type `consts int*`. Of course
that implies that n itself is const.

But that is a separate thing. Suppose T was an alias for 'const int'. Then:

T x; // defines a readonly variable (which probably needs
// initialising)
T* y; // defines a variable pointer

'const' is out of the picture. Other languages tend to have special
keywords that apply to the variable declaration, not the type, for example:

let x:int # non-mutable
var y:int* # mutable (using whatever pointer syntax)

'const' C looks like it works like that, but it doesn't. There also
examples like this:

int const * const p;

Here storage for p is allocated, but it it the second 'const' that makes
it readonly. The first 'const' is not involved in allocation at all.
This is easy to get mixed up.

Post by Keith Thompson
I'm not sure what's so confusing
about that. If const applied *directly* to variables, it's hard to see
how something like &n could be treated consistently.
"const" has to apply to types anyway. Are you suggesting that it should
have an additional meaning when applied to variables? What would be the
advantage of that?

It declares *and defines* (allocates storage for) a pointer to a VLA.

VLAs are mostly linked to stack allocation. But that only applies when
the array is at the top level of the type spec, in the same why that
it's the top-level 'const' that would determine whether storage is
read-only - if declaring a variable.

As I said, other languages tend to only have that top-level aspect. I
consider that less confusing. I don't think you'd see multiple 'let' or
'mut' keywords within one variable declaration.

Keith Thompson

2024-11-28 22:38:30 UTC

Post by Bart
I think 'const' is confusing for similar reasons that VLAs can be both
confusing and awkward to implement.
That's because both really apply to /types/, not directly to variables.

Sure. For example, given
const int n = 42;
n is of type `const int`, and &n is of type `consts int*`. Of course
that implies that n itself is const.

T x; // defines a readonly variable (which probably needs
// initialising)
T* y; // defines a variable pointer
'const' is out of the picture.

You say T is an alias (what, a macro?) for 'const int', you show code
using T, and then you say "'const' is out of the picture". If you have
a point, it escapes me.

Post by Bart
Other languages tend to have special
let x:int # non-mutable
var y:int* # mutable (using whatever pointer syntax)

Yes, other languages are different. Few, if any, languages that
are not based on C have adopted C's odd declaration syntax.

Post by Bart
'const' C looks like it works like that, but it doesn't.

It doesn't look like it works like that if you understand how it
actually does work.

Post by Bart
There also
int const * const p;
Here storage for p is allocated, but it it the second 'const' that
makes it readonly. The first 'const' is not involved in allocation at
all. This is easy to get mixed up.

Yes, and you seem determines to make it easier to get mixed up.

[...]

Post by Bart
VLAs are mostly linked to stack allocation. But that only applies when
the array is at the top level of the type spec, in the same why that
it's the top-level 'const' that would determine whether storage is
read-only - if declaring a variable.
As I said, other languages tend to only have that top-level aspect. I
consider that less confusing. I don't think you'd see multiple 'let'
or 'mut' keywords within one variable declaration.

Other languages confuse you less than C does. We know.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Bart

2024-11-28 23:05:09 UTC

Post by Bart
I think 'const' is confusing for similar reasons that VLAs can be both
confusing and awkward to implement.
That's because both really apply to /types/, not directly to variables.

Sure. For example, given
const int n = 42;
n is of type `const int`, and &n is of type `consts int*`. Of course
that implies that n itself is const.

T x; // defines a readonly variable (which probably needs
// initialising)
T* y; // defines a variable pointer
'const' is out of the picture.

You say T is an alias (what, a macro?) for 'const int', you show code
using T, and then you say "'const' is out of the picture". If you have
a point, it escapes me.

Well, can you see 'const' in my example? You can't tell x is readonly by
only looking at this.

Post by Keith Thompson
Yes, and you seem determines to make it easier to get mixed up.

C doesn't require any help from me for confusing features. The OP said
it was confusing and I tried to point out why it might be.

Obviously you as C expert will never be confused. But there are lots of
less expert users of the language.

I've just several minutes trying to figure why all these assignments are
invalid:

typedef int* T;

int const x;
T const y;
int* const z;

x=0;
y=0;
z=0;

because I thought would behave differently, with 'const' being the
opposite side of '*' to the base-type.

I forgot that here it would be the right-most 'const' that controls
storage attributes of 'z'.

You will of course say that I'm the only person in the world who could
make that mistake.

Keith Thompson

2024-11-28 23:20:17 UTC

Post by Bart
I think 'const' is confusing for similar reasons that VLAs can be both
confusing and awkward to implement.
That's because both really apply to /types/, not directly to variables.

Sure. For example, given
const int n = 42;
n is of type `const int`, and &n is of type `consts int*`. Of course
that implies that n itself is const.

T x; // defines a readonly variable (which probably needs
// initialising)
T* y; // defines a variable pointer
'const' is out of the picture.

You say T is an alias (what, a macro?) for 'const int', you show code
using T, and then you say "'const' is out of the picture". If you
have a point, it escapes me.

Well, can you see 'const' in my example? You can't tell x is readonly
by only looking at this.

Yes, you said that T is an alias for 'const int'. Not sure why you
wrote "alias". Is it a macro, or a typedef, or something else?

I suggest that hiding "const" behind a macro or typedef is usually a bad
idea. Why did you do it here? Is your example based on real code, or
did you contrive it to be as confusing as possible?

Post by Keith Thompson
Yes, and you seem determines to make it easier to get mixed up.

C doesn't require any help from me for confusing features.

No, but people using C sometimes require help in resolving their
confusion rather than reinforcing it.

Post by Bart
The OP said
it was confusing and I tried to point out why it might be.
Obviously you as C expert will never be confused. But there are lots
of less expert users of the language.

Not true. I am occasionally confused. I just don't brag about it, and
I'd rather help others avoid confusion than add to it.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Bart

2024-11-29 00:32:52 UTC

Post by Bart
T x; // defines a readonly variable (which probably needs
// initialising)
T* y; // defines a variable pointer
'const' is out of the picture.

You say T is an alias (what, a macro?) for 'const int', you show code
using T, and then you say "'const' is out of the picture". If you
have a point, it escapes me.

Well, can you see 'const' in my example? You can't tell x is readonly
by only looking at this.

Yes, you said that T is an alias for 'const int'. Not sure why you
wrote "alias". Is it a macro, or a typedef, or something else?
I suggest that hiding "const" behind a macro or typedef is usually a bad
idea. Why did you do it here? Is your example based on real code, or
did you contrive it to be as confusing as possible?

It's to illustrate that the constness of a variable may depend on
something which is remote from its declaration.

Which is unlike how it usually works elsewhere.

(And if it matters, the alias used a typedef.)

For extra confusion, consider this version:

T x, *y;

The storage for x is read-only; for y it isn't. Or is it the other way
around?

Keith Thompson

2024-11-29 02:15:39 UTC

Post by Bart
T x; // defines a readonly variable (which probably needs
// initialising)
T* y; // defines a variable pointer
'const' is out of the picture.

You say T is an alias (what, a macro?) for 'const int', you show code
using T, and then you say "'const' is out of the picture". If you
have a point, it escapes me.

Well, can you see 'const' in my example? You can't tell x is readonly
by only looking at this.

Yes, you said that T is an alias for 'const int'. Not sure why you
wrote "alias". Is it a macro, or a typedef, or something else?
I suggest that hiding "const" behind a macro or typedef is usually a
bad idea. Why did you do it here? Is your example based on real
code, or did you contrive it to be as confusing as possible?

It's to illustrate that the constness of a variable may depend on
something which is remote from its declaration.
Which is unlike how it usually works elsewhere.
(And if it matters, the alias used a typedef.)
T x, *y;
The storage for x is read-only; for y it isn't. Or is it the other way
around?

Yes, deliberately confusing code is confusing. Yes, different languages
are different. Any problems with your code snippets can be solved by
writing them more straightforwardly.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

David Brown

2024-11-29 07:38:29 UTC

[...]

Post by Bart
I think 'const' is confusing for similar reasons that VLAs can be both
confusing and awkward to implement.
That's because both really apply to /types/, not directly to variables.

Sure. For example, given
const int n = 42;
n is of type `const int`, and &n is of type `consts int*`. Of course
that implies that n itself is const.

   T x;           // defines a readonly variable (which probably needs
                  // initialising)
   T* y;          // defines a variable pointer
'const' is out of the picture.

You say T is an alias (what, a macro?) for 'const int', you show code
using T, and then you say "'const' is out of the picture". If you have
a point, it escapes me.

Well, can you see 'const' in my example? You can't tell x is readonly by
only looking at this.

Post by Keith Thompson
Yes, and you seem determines to make it easier to get mixed up.

That one is really simple - clearly "x" is declared "const", and so you
can't assign to it later.

Post by Bart
T const y;

That one is equally simple - clearly "y" is declared "const", and so you
can't assign to it later.

That shows exactly why it can be a good idea to use "typedef", even for
relatively simple things such as adding a pointer or qualifiers to the
type. I would not normally make a typedef just for a pointer, or just
to add a "const" qualifier, but if the final type is complicated, it can
make things clearer. Really, it's just like breaking a complicated
expression into parts with extra local variables to name them.

Post by Bart
int* const z;

This one requires a little more thought, but it should be well within
the capacity of any C programmer to see that "z" is declared "const".

Post by Bart
    x=0;
    y=0;
    z=0;
because I thought would behave differently, with 'const' being the
opposite side of '*' to the base-type.

The "const" in each case clearly applies to the type of the declared
variable.

Post by Bart
I forgot that here it would be the right-most 'const' that controls
storage attributes of 'z'.
You will of course say that I'm the only person in the world who could
make that mistake.

I certainly don't say you are the only person in the world who /could/
make that mistake. But if we narrow it down to the C programmers who
/would/ make such mistakes, you are in a much smaller group.

There are two types of C programmers - those who like to declare
variables "const" on a regular basis, and those who rarely if ever do
(reserving "const" for things like "const int *" pointers). Those that
declare const variables, initialise them at the time, and would not be
trying to assign them later. Those who don't have much use of const
variables, would not be writing such code in the first place.

Basically, I'd only expect to see examples like that in the questions
section of a beginner's book on C. And I'd expect anyone who had read
the preceding chapter to be able to answer it.

C's syntax for types certainly can be confusing and awkward, especially
in complex situations. Mix pointers, qualifiers, function types, and
array syntax together and you can easily make a mess that will take a
lot of effort to unpick.

So the answer is /very/ simple - don't do that.

Make an effort to write your own types clearly, in a way that makes them
obvious to the reader. "typedef" is your friend. If you have a very
complex type (which are rare in practice, but do occur), build it in
parts with typedefs, and give it a typedef itself. Then it is vastly
easier to use multiple times.

You might occasionally have to understand someone else's messy type, but
you can at least make life easier for yourself. I certainly do that.

Bart

2024-11-29 11:04:16 UTC

[...]

Post by Bart
I think 'const' is confusing for similar reasons that VLAs can be both
confusing and awkward to implement.
That's because both really apply to /types/, not directly to variables.

Sure. For example, given
const int n = 42;
n is of type `const int`, and &n is of type `consts int*`. Of course
that implies that n itself is const.

   T x;           // defines a readonly variable (which probably needs
                  // initialising)
   T* y;          // defines a variable pointer
'const' is out of the picture.

You say T is an alias (what, a macro?) for 'const int', you show code
using T, and then you say "'const' is out of the picture". If you have
a point, it escapes me.

Well, can you see 'const' in my example? You can't tell x is readonly
by only looking at this.

Post by Keith Thompson
Yes, and you seem determines to make it easier to get mixed up.

That one is really simple - clearly "x" is declared "const", and so you
can't assign to it later.

Post by Bart
T const y;

That one is equally simple - clearly "y" is declared "const", and so you
can't assign to it later.
That shows exactly why it can be a good idea to use "typedef", even for
relatively simple things such as adding a pointer or qualifiers to the
type. I would not normally make a typedef just for a pointer, or just
to add a "const" qualifier, but if the final type is complicated, it can
make things clearer. Really, it's just like breaking a complicated
expression into parts with extra local variables to name them.

Post by Bart
int* const z;

This one requires a little more thought, but it should be well within
the capacity of any C programmer to see that "z" is declared "const".

These are similar examples:

int * const z1;
int const * z2;

z1=0; // invalid
z2=0; // valid

Post by Bart
     x=0;
     y=0;
     z=0;
because I thought would behave differently, with 'const' being the
opposite side of '*' to the base-type.

The "const" in each case clearly applies to the type of the declared
variable.

Post by Bart
I forgot that here it would be the right-most 'const' that controls
storage attributes of 'z'.

Both 'const' in my new examples are the the right-most one! Yet one
makes the immediate storage const and one doesn't. I guess then that
it's the right-most possible 'const' if it were to be used. In my
example, that would follow the '*'.

Post by Bart
You will of course say that I'm the only person in the world who could
make that mistake.

My original point was trying to address why 'const' as used in C might
be confusing. I was trying to compare how non-mutable variables are
designated in other languages. There it's a keyword that is part of the
declaration syntax, not the type syntax.

I suggested that C is confusing because 'const' looks as though it's
like the former, but it's part of the latter. Which also means you can
have multiple 'const' in a declaration (putting asided repeated 'consts').

So objectively, it IS more complicated than elsewhere with more scope
for getting it wrong.

But of course, this group being what it is, people have to turn it round
to make it about me: I'm deliberately trying to show confusing examples.
Or I'm to thick to understand how const works.

Ike Naar

2024-11-29 12:06:14 UTC

Post by Bart
int * const z1;
int const * z2;
z1=0; // invalid
z2=0; // valid
[snip]
Both 'const' in my new examples are the the right-most one! Yet one
makes the immediate storage const and one doesn't. I guess then that
it's the right-most possible 'const' if it were to be used. In my
example, that would follow the '*'.

The order in which '*' and 'const' appear matters.
With

int * const z1;

the const applies to z1 because it appears immediately before 'z1'.
z1 is a const pointer to int.
(hint: read the declaration out loud from right to left).

*z1 = 0; /* valid */
z1 = 0; /* invalid, z1 is readonly */

With

int const * z2;

the const applies to *z2 because it appears immediately before '* z2'.
*z2 is a const int, z2 is a pointer to const int.
(again, read the declaration out loud from right to left).

*z2 = 0; /* invalid, *z2 is readonly */
z2 = 0; /* valid */

With

int const * const z3;

the leftmost const applies to *z3 and the rightmost const applies to z3.
z3 is a const pointer to const int.

*z3 = 0; /* invalid, *z3 is readonly */
z3 = 0; /* invalid, z3 is readonly */

Michael S

2024-11-29 12:28:10 UTC

On Fri, 29 Nov 2024 11:04:16 +0000

Post by Bart
My original point was trying to address why 'const' as used in C
might be confusing. I was trying to compare how non-mutable variables
are designated in other languages. There it's a keyword that is part
of the declaration syntax, not the type syntax.

I don't see your point.
How 'mut' in Rust is different from 'const' in C except for having
opposit polarity?
How 'readonly' in C# is different from 'const' in C?

Post by Bart
I suggested that C is confusing because 'const' looks as though it's
like the former, but it's part of the latter. Which also means you
can have multiple 'const' in a declaration (putting asided repeated
'consts').

IMHO, any way to mix more than one 'modifier' (not in C standard
meaning of the word, but in more general meaning) is potentially
confusing. It does not matter whether modifier is 'const' or '*' or []
or ().
However not having this ability forces programmer to use too many
typedefs. Multiple typedef are not too bad by themselves, the problem
with them is that programmer has to invent many type names and then
reader has to remember them. So in practice such enforcement ends up
less readable rather than more readable.

Post by Bart
So objectively, it IS more complicated than elsewhere with more scope
for getting it wrong.

Bart

2024-11-29 13:33:30 UTC

Post by Michael S
On Fri, 29 Nov 2024 11:04:16 +0000

I don't see your point.
How 'mut' in Rust is different from 'const' in C except for having
opposit polarity?
How 'readonly' in C# is different from 'const' in C?

I'm not familiar enough with those languages. Can mut or readonly be
used multiple times within a single type specification? Can they be used
in a context where no identifier is involved?

If so then they work like C's const does.

(My own attempts at 'readonly' used keywords that applied to definitions
only, not to types:

const [T] a = x # x must be compile-time expr
let T b := y # runtime y but b can't be reassigned
[var] T c [:= z] # normal fully mutable variable
static T d = w # compile/load-time expr

I've dropped support for explicit 'let'; it tends to be used internally
for implicit assign-once variables like loop indices; or fully readonly
data like tables.)

Post by Michael S

Constructing an abitrary type specification by chaining together
'modifiers' is not confusing by itself:

Pointer to array 10 of const pointer to int x
x: Pointer to array 10 of const pointer to int

It IS confusing the way C does it, because:

* It breaks it up into a base-type ...

* ... plus modifiers that go before any identifier ...

* ... plus modifiers that go after the identifier

* It allows a list of variable names in the same declaration to each
have their own modifiers, so each can be a totally different type

* This means the type of a variable in a list can be split up into three
disjoint parts

* It has 'const' that can go before OR after a basic type, or after
a modifier that goes before an identifier, but not before or after a
modifier that goes after an identifier (so no const array/function)

* The 'root' of the type, what would go on the left if expressed in
LTR form, is somewhere in the middle of each typespec, starting
near the identifier ...

* ... except that often there is no identifier, then you have to apply
an algorithm to figure out what it means

* There are precedence rules which means often having to use parentheses
to get the typespec that you want: T*A[] is different from T(*A)[].

Apart from these minor points, typespecs in C are simple!

But if C types were LTR, and not split up, then figuring out whether the
variable you're declaring was readonly is easy: there would be 'const'
on the extreme left.

(I would also impose a rule that a 'const' on the left could not appear
within a typedef, only at the point the type is used.)

Michael S

2024-11-29 14:15:17 UTC

On Fri, 29 Nov 2024 13:33:30 +0000

Post by Michael S
On Fri, 29 Nov 2024 11:04:16 +0000

Post by Bart
My original point was trying to address why 'const' as used in C
might be confusing. I was trying to compare how non-mutable
variables are designated in other languages. There it's a keyword
that is part of the declaration syntax, not the type syntax.

I don't see your point.
How 'mut' in Rust is different from 'const' in C except for having
opposit polarity?
How 'readonly' in C# is different from 'const' in C?

I'm not familiar enough with those languages. Can mut or readonly be
used multiple times within a single type specification? Can they be
used in a context where no identifier is involved?
If so then they work like C's const does.
(My own attempts at 'readonly' used keywords that applied to
const [T] a = x # x must be compile-time expr
let T b := y # runtime y but b can't be reassigned
[var] T c [:= z] # normal fully mutable variable
static T d = w # compile/load-time expr
I've dropped support for explicit 'let'; it tends to be used
internally for implicit assign-once variables like loop indices; or
fully readonly data like tables.)

Post by Michael S

Post by Bart
I suggested that C is confusing because 'const' looks as though
it's like the former, but it's part of the latter. Which also
means you can have multiple 'const' in a declaration (putting
asided repeated 'consts').

Constructing an abitrary type specification by chaining together
Pointer to array 10 of const pointer to int x
x: Pointer to array 10 of const pointer to int
* It breaks it up into a base-type ...
* ... plus modifiers that go before any identifier ...
* ... plus modifiers that go after the identifier

Yes, presence of both pre and post 'modifiers' makes things much
worse.

Post by Bart
* It allows a list of variable names in the same declaration to each
have their own modifiers, so each can be a totally different type

Not in every context. It is not allowed in function prototypes. Even
when it is allowed, it's never necessary and avoided by majority of
experienced programmers.
I'd guess, TimR will disagree with the last part.

Post by Bart
* This means the type of a variable in a list can be split up into
three disjoint parts
* It has 'const' that can go before OR after a basic type, or after
a modifier that goes before an identifier, but not before or after
a modifier that goes after an identifier (so no const array/function)
* The 'root' of the type, what would go on the left if expressed in
LTR form, is somewhere in the middle of each typespec, starting
near the identifier ...
* ... except that often there is no identifier, then you have to apply
an algorithm to figure out what it means
* There are precedence rules which means often having to use
parentheses to get the typespec that you want: T*A[] is different
from T(*A)[].
Apart from these minor points, typespecs in C are simple!
But if C types were LTR, and not split up, then figuring out whether
the variable you're declaring was readonly is easy: there would be
'const' on the extreme left.
(I would also impose a rule that a 'const' on the left could not
appear within a typedef, only at the point the type is used.)

David Brown

2024-11-29 16:42:54 UTC

Post by Michael S
On Fri, 29 Nov 2024 13:33:30 +0000

Post by Bart
* It allows a list of variable names in the same declaration to each
have their own modifiers, so each can be a totally different type

They can't have "totally different" types - they can have added
indirection or array indicators, following C's philosophy of describing
the type by how the variable is used:

int x, *y, z[10];

Thus "x", "*y" and "z[i]" are all of type "int".

C allows this, but I personally would be happier if it did not. As
Michael says below, most serious programmers don't write such code.

Post by Michael S
Not in every context. It is not allowed in function prototypes. Even
when it is allowed, it's never necessary and avoided by majority of
experienced programmers.
I'd guess, TimR will disagree with the last part.

Bart

2024-11-29 18:26:41 UTC

Post by Michael S
On Fri, 29 Nov 2024 13:33:30 +0000

Post by Bart
* It allows a list of variable names in the same declaration to each
have their own modifiers, so each can be a totally different type

They can't have "totally different" types - they can have added
indirection or array indicators, following C's philosophy of describing
int x, *y, z[10];
Thus "x", "*y" and "z[i]" are all of type "int".

C's syntax allows a 14-parameter function F to be declared in the same
statement as a simple int 'i'.

I'd say that F and i are different types! (Actually I wouldn't even
consider F to be type, but a function.)

That F(1, 2, 3.0, "5", "six", seven, ...) might yield the same type as
'i' is irrelevant here.

Usually, given these declarations:

int A[100]
int *B;
int (*C)();

people would consider the types of A, B and C to be array, pointer and
function pointer respectively. Otherwise, which of the 4 or 5 possible
types would you say that D has here:

int D[3][4][5];

It depends on how it is used in an expression, which can be any of &D,
D, D[i], D[i][j], D[i][j][k], none of which include 'Array' type!

Here's another puzzler:

const int F();

why is 'const' allowed here? There is no storage involved. It's not as
though you could write 'F = 0' is there was no 'const'.

Post by David Brown
C allows this, but I personally would be happier if it did not. As
Michael says below, most serious programmers don't write such code.

It doesn't matter. If you're implementing the language, you need to
allow it.

If trying to figure out why some people have trouble understanding, it's
something to consider.

It's also something to keep in mind if trying to understand somebody
else's code: are they making use of that feature or not?

So this is a wider view that just dismissing design misfeatures just
because you personally won't use them.

With the kind of C I would write, you could discard everything after
C99, and even half of C99, because the subset I personally use is very
conservative.

Keith Thompson

2024-11-29 20:35:15 UTC

Bart <***@freeuk.com> writes:
[...]

Post by Bart
C's syntax allows a 14-parameter function F to be declared in the same
statement as a simple int 'i'.

Yes (except that it's a declaration, not a statement) :

int i = 42, F(int, int, int, int, int, int, int,
int, int, int, int, int, int, int);

Are you under the impression that anyone here was not already aware of
that? Would you prefer it if the number of parameters were arbitrarily
restricted to 13?

Do you think that anyone would actually write code like the above?

C generally doesn't impose arbitrary restrictions. Because of that,
it's possible to write absurd code like the declaration above. 99% of
programmers simply don't do that, so it's not a problem in practice.

Post by Bart
I'd say that F and i are different types! (Actually I wouldn't even
consider F to be type, but a function.)

Neither F nor i is a type. i is an object (of type int), and F is a
function (of type int(int, int, int, int, int, int, int, int, int, int,
int, int, int, int)).

Post by Bart
That F(1, 2, 3.0, "5", "six", seven, ...) might yield the same type as
'i' is irrelevant here.

It's relevant to the syntax. i and F can be declared in the same
declaration only because the type of i and the return type of F happen
to be the same. If F returned void, i and F would have to be declared
separately.

Which, of course, is a good idea anyway.

You're posting repeatedly trying to convince everyone that C allows
ridiculous code. We already know that. You are wasting everyone's time
telling us something that we already know. Most of us just don't obsess
about it as much as you do. Most of us recognize that, however
convoluted C's declaration syntax might be, it cannot be fixed in a
language calling itself "C".

Most of us here are more interested in talking about C as it's
specified, and actually trying to understand it, than in complaining
about it.

Post by Bart
int A[100]
int *B;
int (*C)();
people would consider the types of A, B and C to be array, pointer and
function pointer respectively. Otherwise, which of the 4 or 5 possible
int D[3][4][5];
It depends on how it is used in an expression, which can be any of &D,
D, D[i], D[i][j], D[i][j][k], none of which include 'Array' type!

No, the object D unambiguously has type int[3][4][5], or as cdecl
explains it "array 3 of array 4 of array 5 of int". The *expression* D
may have type int[3][4][5] or int(*)[3][4] ("pointer to array 3 of array
4 of int"), depending on the context.

In particular, in &D, the subexpression D is of array type.

You just need to know about implicit array-to-pointer conversions.
Of course you know all about that, but you don't mention it so it
seems more confusing. You know this better than you pretend to.

Post by Bart
const int F();
why is 'const' allowed here? There is no storage involved. It's not as
though you could write 'F = 0' is there was no 'const'.

You're right that "const" isn't meaningful in that particular context.
I suspect that it's allowed because adding a rule to forbid it would
have made the standard slightly more complicated with no particular
benefit.

Would you write "const int F();"? Or would you omit the "const"? How
does the fact that "const" is allowed inconvenience you?

[...]

Once again, everyone here already knows that C's declaration syntax can
be confusing, and perhaps other languages do it better. Most of us
would rather try to understand it than whine about it.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Bart

2024-11-29 21:52:11 UTC

Post by Bart
C's syntax allows a 14-parameter function F to be declared in the same
statement as a simple int 'i'.

int i = 42, F(int, int, int, int, int, int, int,
int, int, int, int, int, int, int);
Are you under the impression that anyone here was not already aware of
that? Would you prefer it if the number of parameters were arbitrarily
restricted to 13?
Do you think that anyone would actually write code like the above?
C generally doesn't impose arbitrary restrictions. Because of that,
it's possible to write absurd code like the declaration above. 99% of
programmers simply don't do that, so it's not a problem in practice.

Post by Bart
I'd say that F and i are different types! (Actually I wouldn't even
consider F to be type, but a function.)

Neither F nor i is a type. i is an object (of type int), and F is a
function (of type int(int, int, int, int, int, int, int, int, int, int,
int, int, int, int)).

Post by Bart
That F(1, 2, 3.0, "5", "six", seven, ...) might yield the same type as
'i' is irrelevant here.

It's relevant to the syntax. i and F can be declared in the same
declaration only because the type of i and the return type of F happen
to be the same. If F returned void, i and F would have to be declared
separately.
Which, of course, is a good idea anyway.
You're posting repeatedly trying to convince everyone that C allows
ridiculous code. We already know that. You are wasting everyone's time
telling us something that we already know. Most of us just don't obsess
about it as much as you do. Most of us recognize that, however
convoluted C's declaration syntax might be, it cannot be fixed in a
language calling itself "C".
Most of us here are more interested in talking about C as it's
specified, and actually trying to understand it, than in complaining
about it.

No, the object D unambiguously has type int[3][4][5]

(So it would have a different type from E declared on in the same
declaration:

int D[3][4][5], E;

? In that case tell that to David Brown!)

You seem have missed the point of my post, which was a reply to David's
remark that 'they can't have totally different types' which was in
response to my saying that each variable in the same declaration can 'be
[of] a totally different type'.

DB is assuming the type of the variable after it's been used in an
expression that is fully evaluated to yield its base type. So my A[100]
is used as A[i], and D[3][4][5] is used as D[i][j][k].

But of course they may be evaluated only partially, yielding a range of
types.

Post by Keith Thompson
Would you write "const int F();"? Or would you omit the "const"? How
does the fact that "const" is allowed inconvenience you?

It's another point of confusion. In my language I don't treat function
declarations like variable declarations. A function is not a variable.
There is no data storage associated with it.

In C it is unfortunate, as it makes it hard to trivially distinguish a
function declaration (or the start of a function definition) from a
variable declaration.

Keith Thompson

2024-11-29 23:44:00 UTC

Post by Bart
C's syntax allows a 14-parameter function F to be declared in the same
statement as a simple int 'i'.

Post by Bart
I'd say that F and i are different types! (Actually I wouldn't even
consider F to be type, but a function.)

Neither F nor i is a type. i is an object (of type int), and F is a
function (of type int(int, int, int, int, int, int, int, int, int, int,
int, int, int, int)).

Post by Bart
That F(1, 2, 3.0, "5", "six", seven, ...) might yield the same type as
'i' is irrelevant here.

It's relevant to the syntax. i and F can be declared in the same
declaration only because the type of i and the return type of F happen
to be the same. If F returned void, i and F would have to be declared
separately.
Which, of course, is a good idea anyway.
You're posting repeatedly trying to convince everyone that C allows
ridiculous code. We already know that. You are wasting everyone's time
telling us something that we already know. Most of us just don't obsess
about it as much as you do. Most of us recognize that, however
convoluted C's declaration syntax might be, it cannot be fixed in a
language calling itself "C".
Most of us here are more interested in talking about C as it's
specified, and actually trying to understand it, than in complaining
about it.

No, the object D unambiguously has type int[3][4][5]

(So it would have a different type from E declared on in the same
int D[3][4][5], E;
? In that case tell that to David Brown!)

Yes, of course D and E have different types. I'm certain he's
aware of that.

I wrote that the object D is unambiguously of type int[3][4][5], and the
expression D can be of the array type int[3][4][5] or of the pointer
type int(*)[3][4], depending on the context. Do you agree? Or do you
still claim that D can have any of "4 or 5 possible types"?

(Note that I'm not talking about the type of the expression D[i] or of
any other expression that includes D as a subexpression.)

Post by Bart
You seem have missed the point of my post, which was a reply to
David's remark that 'they can't have totally different types' which
was in response to my saying that each variable in the same
declaration can 'be [of] a totally different type'.

David apparently has a different definition of "totally different types"
than you do. Since the standard doesn't define that phrase, I suggest
not wasting time arguing about it.

Given:
int D[3][4][5], E;
the object D is of type int[3][4][5], and E is of type int. Do you
understand that?

If you wanted to change the type of D from int[3][4][5] to
double[3][4][5], you'd have to use two separate declarations.
Do you understand that? (Of course you do, but will you admit that
you understand it?)

I think that distinction is what David had in mind. double[3][4][5] and
int are "totally different types", but int[3][4][5] and int are not.
Entities of "totally different types" cannot be declared in a single
declaration. You don't have to accept that meaning of the phrase (which
I find a bit vague), but it's clearly what David meant.

The point is that there are restrictions on what can be combined into a
single declaration. But these days it's usually considered good style
to declare only one identifier in each declaration, so while this :
int i, *p;
is perfectly valid, and every C compiler must accept it, this :
int i;
int *p;
is preferred by most C programmers.

Do you understand that?

Post by Bart
DB is assuming the type of the variable after it's been used in an
expression that is fully evaluated to yield its base type. So my
A[100] is used as A[i], and D[3][4][5] is used as D[i][j][k].
But of course they may be evaluated only partially, yielding a range
of types.

What "range of types" do you think D can have?

Post by Keith Thompson
Would you write "const int F();"? Or would you omit the "const"? How
does the fact that "const" is allowed inconvenience you?

It's another point of confusion. In my language I don't treat function
declarations like variable declarations. A function is not a
variable. There is no data storage associated with it.

In C, declarations can declare objects, functions, types, etc. I fail
to see how your language is relevant.

Post by Bart
In C it is unfortunate, as it makes it hard to trivially distinguish a
function declaration (or the start of a function definition) from a
variable declaration.

It's not as hard as you insist on pretending it is. A function
declaration includes a pair of parentheses, either empty or
containing a list of parameters or parameter types.

Function declarations outside header files are valid, but tend to be
rare in well-written C code.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Waldek Hebisch

2024-11-30 00:55:01 UTC

Post by Bart
It's another point of confusion. In my language I don't treat function
declarations like variable declarations. A function is not a
variable. There is no data storage associated with it.

In C, declarations can declare objects, functions, types, etc. I fail
to see how your language is relevant.

Post by Bart
In C it is unfortunate, as it makes it hard to trivially distinguish a
function declaration (or the start of a function definition) from a
variable declaration.

It's not as hard as you insist on pretending it is. A function
declaration includes a pair of parentheses, either empty or
containing a list of parameters or parameter types.
Function declarations outside header files are valid, but tend to be
rare in well-written C code.

Hmm, in well-written code static functions are likely to be a
majority. Some people prefer to declare all functions and
put declarations of static functions in the same file as the
functions itself. Conseqently, function declarations are not
rare in such code. Do you consider it well-written?

--
Waldek Hebisch

Keith Thompson

2024-11-30 01:02:47 UTC

[...]

Post by Waldek Hebisch

Post by Keith Thompson
Function declarations outside header files are valid, but tend to be
rare in well-written C code.

Sure, I missed that case.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

James Kuyper

2024-11-30 01:38:51 UTC

On 11/29/24 19:55, Waldek Hebisch wrote:
...

Post by Waldek Hebisch
Hmm, in well-written code static functions are likely to be a
majority. Some people prefer to declare all functions and
put declarations of static functions in the same file as the
functions itself. Conseqently, function declarations are not
rare in such code. Do you consider it well-written?

I wouldn't go so far as to say that it's poorly written, but I don't
like the unnecessary redundancy of that approach. Whenever possible, I
prefer to let each static function's definition serve as it's only
declaration. This isn't possible, for instance, if you have a pair of
mutually recursive functions.

The redundancy between a header file's function declaration and the
corresponding function definition is necessary, given the way that C
works. Avoiding that is one of the reasons I like declaring static
functions, where appropriate.

Michael S

2024-11-30 17:08:29 UTC

On Fri, 29 Nov 2024 20:38:51 -0500

Post by James Kuyper
...

I wouldn't go so far as to say that it's poorly written, but I don't
like the unnecessary redundancy of that approach. Whenever possible, I
prefer to let each static function's definition serve as it's only
declaration. This isn't possible, for instance, if you have a pair of
mutually recursive functions.
The redundancy between a header file's function declaration and the
corresponding function definition is necessary, given the way that C
works. Avoiding that is one of the reasons I like declaring static
functions, where appropriate.

Top-down-minded people don't like details textually preceding "big
picture".

[O.T.]
Better solution would be if static function definition anywhere in the
file serves like declaration (prototype) for the whole file, including
preceding part. We are long past the time where single-pass compiler
was a legit argument against such arrangement. Nowadays the only
possible counter argument would be breaking existing code. But I don't
see how such change breaks anything.

David Brown

2024-12-01 13:50:40 UTC

Post by Waldek Hebisch

In C, declarations can declare objects, functions, types, etc. I fail
to see how your language is relevant.

Post by Bart
In C it is unfortunate, as it makes it hard to trivially distinguish a
function declaration (or the start of a function definition) from a
variable declaration.

It's not as hard as you insist on pretending it is. A function
declaration includes a pair of parentheses, either empty or
containing a list of parameters or parameter types.
Function declarations outside header files are valid, but tend to be
rare in well-written C code.

Without doubt, most functions (and non-local data) should be static.

However, IMHO writing (non-defining) declarations for your static
functions is a bad idea unless it is actually necessary to the code
because you are using them in function pointers or have particularly
good reasons for the way you order your code.

I don't find redundant declarations of static functions at all useful -
and I find them of significant cost in maintaining files. It is far too
easy to forget to update them when you change, delete or add new
functions. And a list of such declarations that you don't feel you can
trust entirely, is worse than useless.

Such lists might have been helpful to some people decades ago, when
editors were more primitive. If I need a list of functions in a file
(maybe it's someone else's code, or old code of mine), any programmer's
editor or IDE will give me it - updated correctly in real-time, and not
out of sync.

Bart

2024-12-01 14:23:30 UTC

Without doubt, most functions (and non-local data) should be static.

I have a tool that translates C programs to my syntax. Most functions of
codebases I tried are marked 'global', because the C version did not use
'static'.

Generally those functions don't need to be exported. This is just
laziness or ignorance on the part of the program, not helped by C using
the wrong default.

Post by David Brown
However, IMHO writing (non-defining) declarations for your static
functions is a bad idea unless it is actually necessary to the code
because you are using them in function pointers or have particularly
good reasons for the way you order your code.

A good reason might be NOT CARING how the code is ordered. I have to
constantly keep that in mind when writing C programs. I don't like
movings functions about after they've been written, it is easier to add
a forward declaration, even though I'd rather not do that at all.

It is just another annoyance.

Some people may also prefer top-down ordering of their functions.

Post by David Brown
I don't find redundant declarations of static functions at all useful -
and I find them of significant cost in maintaining files. It is far too
easy to forget to update them when you change, delete or add new
functions. And a list of such declarations that you don't feel you can
trust entirely, is worse than useless.

Why doesn't the compiler report a declaration that doen't match the
definition?

Post by David Brown
Such lists might have been helpful to some people decades ago, when
editors were more primitive. If I need a list of functions in a file
(maybe it's someone else's code, or old code of mine), any programmer's
editor or IDE will give me it - updated correctly in real-time, and not
out of sync.

Why isn't this a problem for exported/shared functions?

That is, for all sorts of functions and variables declared in headers
where there is a declaration in header, and a definition in some 'home'
module.

David Brown

2024-12-01 15:50:15 UTC

Without doubt, most functions (and non-local data) should be static.

I have a tool that translates C programs to my syntax. Most functions of
codebases I tried are marked 'global', because the C version did not use
'static'.
Generally those functions don't need to be exported. This is just
laziness or ignorance on the part of the program, not helped by C using
the wrong default.

I agree that the default for a language should be small scope (in this
case, functions should be static to the file) rather than large scope.
But it's not difficult to have a habit of adding "static" to all your
functions that are not exported - even for a C generator.

A good reason might be NOT CARING how the code is ordered.

I care how my code is organised and structured. I care, regardless of
whether or not the language cares. I might be a little freer in the the
ordering if the language does not require or promote a particular
ordering - but I pick the order intentionally.

I'd be quite happy with code in Python, or any other language which does
not impose a bottom-up ordering like C, to be written using a top-down
approach where each function calls other functions defined below it.
I'd also be happy with major algorithmic functions written one place and
small utility functions in a different section, or publicly callable
code first then private internal code. There's lots of organisation of
code that can make sense.

I'd not be impressed with "don't care" as an organisation.

Post by David Brown
I don't find redundant declarations of static functions at all useful
- and I find them of significant cost in maintaining files. It is far
too easy to forget to update them when you change, delete or add new
functions. And a list of such declarations that you don't feel you
can trust entirely, is worse than useless.

Why doesn't the compiler report a declaration that doen't match the
definition?

If the declarations are a mismatch - you have a function declared and
defined with different parameter or return types - it is an error. But
it is not an error to declare static functions that are never defined as
long as they are never used - so renamed or deleted functions can have
their declarations left behind. ("gcc -Wall" will warn about this, but
you would not want to use a tool that is potentially helpful.) And
defining and using a static function without a forward declaration is
rarely considered something to warn about, so missing declarations in
the list will not be diagnosed. (Maybe clang or clang-tidy have
warnings for that - they support more warnings than gcc.)

Post by David Brown
Such lists might have been helpful to some people decades ago, when
editors were more primitive. If I need a list of functions in a file
(maybe it's someone else's code, or old code of mine), any
programmer's editor or IDE will give me it - updated correctly in
real-time, and not out of sync.

Why isn't this a problem for exported/shared functions?
That is, for all sorts of functions and variables declared in headers
where there is a declaration in header, and a definition in some 'home'
module.

What do you mean here?

I certainly consider it a weakness in C that you don't have clear
requirements and limitations for what can be in a header or a C file, or
how things can be mixed and matched. Keeping code clear and
well-ordered therefore requires discipline and standardised arrangement
of code and declarations. Different kinds of projects will have
different requirements here, but for my own code I find it best to be
strict that for any C file "file.c", there will be a header "file.h"
which contains "extern" declarations of any exported functions or data,
along with any type declarations needed to support these. My tools will
warn on any mismatches, such as non-static functions without a matching
"extern" declaration. They can't catch everything - the way C is built
up, there is no distinction between external declarations that should be
defined in the same module and ones that are imported from elsewhere.

Bart

2024-12-01 20:12:25 UTC

Post by David Brown
Such lists might have been helpful to some people decades ago, when
editors were more primitive. If I need a list of functions in a file
(maybe it's someone else's code, or old code of mine), any
programmer's editor or IDE will give me it - updated correctly in
real-time, and not out of sync.

Why isn't this a problem for exported/shared functions?
That is, for all sorts of functions and variables declared in headers
where there is a declaration in header, and a definition in some
'home' module.

What do you mean here?

You said you didn't want a list of declarations to maintain for static
functions within a module.

But for non-static functions, which are shared via a header, you /need/
such a list to be maintained:

prog.h: int F(int);

prog.c: #include "prog.h"

static int G(int a);

int F(int a) {return 0;}

static int G(int a) {return 0;}

Here, you object to having to maintain the declaration for G, but you
still need to do so for F, and inside a separate file.

The declaration for F could also get out of sync, but you don't consider
that a problem?

And if it isn't because your tools help with this, then they can help
with G too.

Post by David Brown
I certainly consider it a weakness in C that you don't have clear
requirements and limitations for what can be in a header or a C file, or
how things can be mixed and matched. Keeping code clear and
well-ordered therefore requires discipline and standardised arrangement
of code and declarations. Different kinds of projects will have
different requirements here, but for my own code I find it best to be
strict that for any C file "file.c", there will be a header "file.h"
which contains "extern" declarations of any exported functions or data,
along with any type declarations needed to support these. My tools will
warn on any mismatches, such as non-static functions without a matching
"extern" declaration. They can't catch everything - the way C is built
up, there is no distinction between external declarations that should be
defined in the same module and ones that are imported from elsewhere.

Yes, this is why a module scheme (such as the kind I use) is invaluable.

In the example above, you'd define both F and G in one place. There is
no header and there are no separate declarations.

If another module wishes to use F, then it imports the whole module that
defines F.

Some schemes can selectively import individual functions, but to me
that's pointless micro-managing.

In my scheme, it is not even necessary for individual modules to
explicitly import each other: a simple list of modules is provided in
one place, and they will automatically import each others' exported
entities (which include functions, variables, types, enums, structs,
named constants, and macros).

Janis Papanagnou

2024-12-01 15:37:01 UTC

Without doubt, most functions (and non-local data) should be static.

Most functions should be part of a 'class' declaration.

*oops* - wrong newsgroup. ;-)

Janis

Bart

2024-11-30 00:57:56 UTC

Post by Bart
(So it would have a different type from E declared on in the same
int D[3][4][5], E;
? In that case tell that to David Brown!)

Yes, of course D and E have different types. I'm certain he's
aware of that.

Apparently the quibble is about the meaning of 'totally different'. I
would have thought that 'incompatible' would have covered it.

But it looks like, for the sake of argument, types aren't 'totally'
different if they have even the slightest point of similarity. So an
'int' type, and a bloody great function, not even a real type, must be
considered somewhat identical if the latter happens to returns an int?

In my language which you despise but provides a great perspective,
variables declared in the same declaration have 100% the same type. If
they are even 1% different, then that is a separate type and they need
their own declarations. They are no gradations!

Post by Keith Thompson
What "range of types" do you think D can have?

If DB is talking about the type of D[i][j][k], then it is also necessary
to consider the types of D, D[i] etc. That's why it's not useful to talk
about anything other than the type of the value stored in D (and in C,
before D is used in any expression).

Post by Keith Thompson
Would you write "const int F();"? Or would you omit the "const"? How
does the fact that "const" is allowed inconvenience you?

It's another point of confusion. In my language I don't treat function
declarations like variable declarations. A function is not a
variable. There is no data storage associated with it.

In C, declarations can declare objects, functions, types, etc.

Actually types would be another category, that can also start off
looking like a variable declaration.

Post by Keith Thompson
to see how your language is relevant.

Because it's confusing in C. The perspective of a quite different syntax
/for the same class of language/ makes that extra clear to me.

In C all declarations are based around the syntax as used for variables,
even function definitions.

Post by Bart
In C it is unfortunate, as it makes it hard to trivially distinguish a
function declaration (or the start of a function definition) from a
variable declaration.

It's not as hard as you insist on pretending it is. A function
declaration includes a pair of parentheses, either empty or
containing a list of parameters or parameter types.

Yes it is, I'm not pretending at all.

Perhaps you can post a trivial bit of C code which reads in C source
code and shows the first lines of all the function definitions, not
prototypes nor function pointers. It can assume that each starts at the
beginning of a line.

However each may start with a user-defined type or there may be a macros
in any positions.

I can tell that in my syntax, function definitions start with a line
like this ([...] means optional; | separates choices):

['global'|'export'] 'func'|'proc' name ...

Which one do you think would be easier? (Function declarations are
generally not used.)

Post by Keith Thompson
Function declarations outside header files are valid, but tend to be
rare in well-written C code.

Function declarations are everywhere. They are usually needed for static
function otherwise you will have hundreds of function definitions that
must be written and maintained in a specific order.

Keith Thompson

2024-11-30 01:28:22 UTC

Post by Bart
(So it would have a different type from E declared on in the same
int D[3][4][5], E;
? In that case tell that to David Brown!)

Yes, of course D and E have different types. I'm certain he's
aware of that.

Apparently the quibble is about the meaning of 'totally different'. I
would have thought that 'incompatible' would have covered it.

No, "incompatible" has a specific meaning in C which is unrelated to
what we're talking about.

Post by Bart
But it looks like, for the sake of argument, types aren't 'totally'
different if they have even the slightest point of similarity. So an
'int' type, and a bloody great function, not even a real type, must be
considered somewhat identical if the latter happens to returns an int?

I won't try to speak for David. I speculated about what he meant by
what was I presume was a throwaware phrase used in a specific context.
(I probably read it at the time, but I don't remember.) I will not
discuss the meaning of "totally different types" any further.

Post by Bart
In my language which you despise but provides a great perspective,
variables declared in the same declaration have 100% the same type. If
they are even 1% different, then that is a separate type and they need
their own declarations. They are no gradations!

You claim that I "despise" your language.

That is a lie.

I'm simply not interested in it. If I were, I'd be glad to discuss it
in, say, comp.lang.misc.

It is a fact that a C declaration can declare multiple entities,
possibly of different types, but with restrictions on which types can be
combined. I have not expressed an opinion on whether I think that's a
good idea. It's a feature that I rarely take advantage of.

Post by Keith Thompson
What "range of types" do you think D can have?

If DB is talking about the type of D[i][j][k], then it is also
necessary to consider the types of D, D[i] etc. That's why it's not
useful to talk about anything other than the type of the value stored
in D (and in C, before D is used in any expression).

The definition of D was :
int D[3][4][5];
I don't know whether DB was talking about "the type of D[i][j][k]"
(which is obviously int, BTW). I was talking about the type of D.
I've already told you what type the object D has, and the two types
the expression D can have depending on context. I note your refusal
to answer.

Post by Keith Thompson
Would you write "const int F();"? Or would you omit the "const"? How
does the fact that "const" is allowed inconvenience you?

It's another point of confusion. In my language I don't treat function
declarations like variable declarations. A function is not a
variable. There is no data storage associated with it.

In C, declarations can declare objects, functions, types, etc.

Actually types would be another category, that can also start off
looking like a variable declaration.

Yes, I mentioned types. Your point?

Post by Keith Thompson
to see how your language is relevant.

Because it's confusing in C. The perspective of a quite different
syntax /for the same class of language/ makes that extra clear to me.
In C all declarations are based around the syntax as used for
variables, even function definitions.

Yes, C declarations mostly follow a "declaration follows use"
pattern. Everyone knows that. Many, perhaps most, C programmers
don't particularly like it, but spend little time complaining about
it. I have yet to see you make an interesting point about that fact.

Post by Bart
In C it is unfortunate, as it makes it hard to trivially distinguish a
function declaration (or the start of a function definition) from a
variable declaration.

It's not as hard as you insist on pretending it is. A function
declaration includes a pair of parentheses, either empty or
containing a list of parameters or parameter types.

No. It's straightforward for an experienced C programmer looking at
code that's not deliberately obscure. A program that can do the same
thing reliably would have to include a C preprocessor and parser.

It's obvious that :
int foo(int);
is intended to be a function declaration, but if there's a visible macro
definition :
#define foo(n) foo
then it's an object declaration. No "trivial bit of C code" will handle
that kind of thing.

No doubt you'll take that as evidence that C Is Bad rather than an
argument against writing bad code.

And I was talking about function declarations, not function definitions.
The goalposts were fine where they were.

Post by Bart
However each may start with a user-defined type or there may be a
macros in any positions.
I can tell that in my syntax, function definitions start with a line
['global'|'export'] 'func'|'proc' name ...
Which one do you think would be easier? (Function declarations are
generally not used.)

I don't care.

Yes, languages than C can have better declaration syntax than C does
(where "better" is clearly subjective). Perhaps yours does. How many
times do I have to acknowledge that before you'll stop harping on it?

Post by Keith Thompson
Function declarations outside header files are valid, but tend to be
rare in well-written C code.

Function declarations are everywhere. They are usually needed for
static function otherwise you will have hundreds of function
definitions that must be written and maintained in a specific order.

I acknowledged elsewhere that I forgot about declarations of static
functions. (Hundreds of function definitions in a single source file
seem unlikely.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Janis Papanagnou

2024-11-30 03:25:24 UTC

Post by Bart
[...]
I can tell that in my syntax, function definitions start with a line
['global'|'export'] 'func'|'proc' name ...
Which one do you think would be easier? (Function declarations are
generally not used.)

I don't care.
Yes, languages than C can have better declaration syntax than C does
(where "better" is clearly subjective). Perhaps yours does. [...]

From the various bits and pieces spread around I saw that Bart had
obviously adopted many syntactical elements of Algol 68, and I wonder
why he hadn't used just this language (or any "better" language than
"C") if he dislikes it so much that he even implemented own languages.
But okay.

Janis

Bart

2024-11-30 11:59:34 UTC

I don't care.
Yes, languages than C can have better declaration syntax than C does
(where "better" is clearly subjective). Perhaps yours does. [...]

It needed to be a lower level language that could be practically
implemented on a then small machine.

Algol68 implementations were scarce especially on 8-bit systems.

But I also considered it too high level and hard to understand. Even the
syntax had features I didn't like, like keyword stropping and fiddly
rules about semicolon placement.

As for better languages than C, there were very few at that level. Even
C was not so practical: C compilers cost money (I wasn't a programmer,
my boss wouldn't pay for it!).

There would have been problems just getting it into the machine (since
on CP/M, every machine used its own disk format). And by the accounts I
read later on in old Byte magazine articles, C compilers were hopelessly
slow running on floppy disks. (Perhaps Turbo C excepted.)

By the time C might have been viable, I found that my language was
preferable.

Janis Papanagnou

2024-12-01 09:36:17 UTC

Post by Janis Papanagnou
From the various bits and pieces spread around I saw that Bart had
obviously adopted many syntactical elements of Algol 68, and I wonder
why he hadn't used just this language (or any "better" language than
"C") if he dislikes it so much that he even implemented own languages.

It needed to be a lower level language that could be practically
implemented on a then small machine.

Okay.

Post by Bart
Algol68 implementations were scarce especially on 8-bit systems.

Indeed. (If existing at all; I can't tell.)

Post by Bart
But I also considered it too high level and hard to understand.

This I find astonishing, given that it is (IMO; and different from C)
a so cleanly defined language.

Post by Bart
Even the
syntax had features I didn't like, like keyword stropping

Stropping was a way to solve the limited characters available in the
system character sets. Practically, as an implementer, you could use
any mechanism you like. (On the mainframe I had used symbols preceded
by a dot, the Genie compiler uses uppercase, for example. None is a
problem for the implementer.)

Post by Bart
and fiddly rules about semicolon placement.

Huh? - The semicolon placement as delimiters is quite clear and (as so
many things in Algol 68) also clearly defined (IMO). - So what do you
have in mind here?

Post by Bart
As for better languages than C, there were very few at that level.

(But you know you can use Algol 68 on a system development level; we
can read that it had been done at those day. - All that's "missing",
and that's a good design decision, were pointers.)

But okay, it's of course a personal judgement what's "better", and
what you like to use to program or like to use as a paragon for an
own language design and own implementation.

Post by Bart
Even
C was not so practical: C compilers cost money (I wasn't a programmer,
my boss wouldn't pay for it!).

At those days everything had cost money. (Probably with the notable
exception of UNIX, as they say, which had more sort of a fee than a
market price, at least for the universities.) But, according to some
historic sources I read, you also payed primarily the system not the
software; vendors supported availability of languages to sell their
hardware.

Post by Bart
There would have been problems just getting it into the machine (since
on CP/M, every machine used its own disk format). And by the accounts I
read later on in old Byte magazine articles, C compilers were hopelessly
slow running on floppy disks. (Perhaps Turbo C excepted.)

(I don't get what argument you are trying to make. - That you wanted
some terse language, maybe, as you already said above?)

Post by Bart
By the time C might have been viable, I found that my language was
preferable.

Janis

Bart

2024-12-01 11:52:17 UTC

[About Algol68]

Post by Bart
But I also considered it too high level and hard to understand.

This I find astonishing, given that it is (IMO; and different from C)
a so cleanly defined language.

Algol68 was famous for its impenetrable specification. Its Revised
Report was the programming language equivalent of James Joyce's 'Ulysses'.

I needed a clean simple syntax and 100% obvious and explicit semantics.

Post by Bart
Even the
syntax had features I didn't like, like keyword stropping

Yes, but they made writing, reading and maintaining source code
impossible. You'd spend most of your time switching case, or babysitting
semicolons (see below).

I can live without embedded spaces within identifiers - most languages
do. That was the primary reason for the stropping.

If I really need to use a reserved word as an identifier now (which only
happens if porting from another language), I can use a backtick:

int `Int, `INT, `Int

This also enables case-sensitivity (my syntax was case-insensitive). I
don't think case-stropping for example can manage that.

Post by Bart
and fiddly rules about semicolon placement.

Huh? - The semicolon placement as delimiters is quite clear and (as so
many things in Algol 68) also clearly defined (IMO). - So what do you
have in mind here?

It just makes life harder. It special-cases the last statement of any
block, which must be semicolon free, as it's strictly a separator. So:

* Adding a new statement to the end of a block, you must apply ; to the
current last statement

* Deleting the last line, you must delete the ; on the previous.

* Move any of the lines about, and you may again need to update the
semicolons if the last was included

* Temporarily comment out lines including the last, you must also
temporarily remove ; from the line before the comments

* Copy the whole block elsewhere, you might need to add ;

* Temporarily comment out a whole block (or start off with an empty
block that will be populated later) you need to use SKIP, another annoyance.

Usually you're not aware of this until the compiler tells you and you
have to go back in and fix it.

Allow semicolons to be a /terminator/, and all that goes away. It's a no
brainer. But then I don't like having to write semicolons at all, and
generally I don't.

The whole thing with stropping and semicolons is just a colossal
time-waster.

Post by Bart
As for better languages than C, there were very few at that level.

(But you know you can use Algol 68 on a system development level; we
can read that it had been done at those day. - All that's "missing",
and that's a good design decision, were pointers.)

The only actual implementation I've come across is A68G. That's an
interpreter.

It runs my Fibonacci benchmark in 16 seconds. My main /dynamic/ language
interpreter runs it in 1.3 seconds. My C interpreter, which I consider
hopelessly slow, takes 6 seconds. My unoptimised C is 0.24 seconds.

Optimised C is 0.12 seconds, 130 times faster than A68G.

It's quite unsuited to systems programming, and not just because of its
execution speed. However, I'd quite like to see A68G implemented in A68G!

Algol68 was a fascinating and refreshing language back then. It looked
great when typeset in a book. But its practicalities were annoying, and
now it is quite dated.

(I don't get what argument you are trying to make. - That you wanted
some terse language, maybe, as you already said above?)

That there were practical problems in physically getting the program
into the machine. And when it did run, it would have taken minutes to
build anything rather than seconds.

(See: https://archive.org/details/byte-magazine-1983-08/page/n111/mode/2up

A chart of compile-times is on page 122. The same issue compares 8086 C
compilers, and introduces the C language.)

A lot of my HLL programs were short and intended to test some hardware.
My resident in-memory compiler translated them more or less instantly. A
formal build using disk-based programs, source files, object files, and
linkers would have been too time-consuming (little has changed!).

I was generally regarded as a whizz-kid; that would have been difficult
to keep up if my boss saw me twiddling my thumbs everytime he looked in.

Janis Papanagnou

2024-12-01 15:08:49 UTC

Post by Bart
[About Algol68]

Post by Bart
But I also considered it too high level and hard to understand.

This I find astonishing, given that it is (IMO; and different from C)
a so cleanly defined language.

Algol68 was famous for its impenetrable specification. Its Revised
Report was the programming language equivalent of James Joyce's 'Ulysses'.

I see, you mostly find the specification document hard to understand,
and you have a point. (Not that I think that the C standard documents
texts that are occasionally posted here would be easy to understand.
Standards are certainly no textbooks. Myself, for example, I had the
arguable honor to implement an X.500 system for an European telephone
directory collaboration; the ITU-T X.500 series was vastly copious.)

Where I'd disagree is if you mean that Algol 68 as a language is, from
a programmer's perspective, hard to understand. Compared to "C" I think
it's a lot simpler (despite its richness of language features) due to
its clear formal design (and as opposed to "C" with all its design
quirks). And that's also true if compared with languages that have a
more streamlined design. (YMMV.)

Post by Bart
I needed a clean simple syntax and 100% obvious and explicit semantics.

Looks like we agree. (But you obviously came to a different conclusion
than me.)

Post by Janis Papanagnou
[ Stropping ]

Yes, but they made writing, reading and maintaining source code
impossible. [...]

Really? - For me it's exactly the opposite; having the keywords stand
out lexically (or graphically) is what adds to legibility! (Even if I
design and formulate an algorithm as a sketch on paper or white board
I underline the keywords. On a contemporary terminal you'd use syntax
highlighting [ that some folks, as I've learned, also don't like ].)

(I admit that hitting the Shift or the Caps-Lock key may be considered
cumbersome by [some/most] people. - I "pay the price" for legibility.)

Post by Bart
[...]
If I really need to use a reserved word as an identifier now [...]

(BTW, some language allows that by context sensitivity; in Unix shell,
for example, you can write for for in in ; do : ; done without
"stropping".)

Post by Bart
[ snip examples of "Bart's language" ]

(It makes no sense to compare Algol 68 with "your language" and discuss
that with me. - I understood that you find it a good idea to implement
an own [irrelevant] language to serve your needs [and preferences].)

Post by Bart
and fiddly rules about semicolon placement.

Huh? - The semicolon placement as delimiters is quite clear and (as so
many things in Algol 68) also clearly defined (IMO). - So what do you
have in mind here?

It just makes life harder. It special-cases the last statement of any
* Adding a new statement to the end of a block, you must apply ; to the
current last statement
* Deleting the last line, you must delete the ; on the previous.
* Move any of the lines about, and you may again need to update the
semicolons if the last was included
* Temporarily comment out lines including the last, you must also
temporarily remove ; from the line before the comments
* Copy the whole block elsewhere, you might need to add ;
* Temporarily comment out a whole block (or start off with an empty
block that will be populated later) you need to use SKIP, another annoyance.

You have problems to separate new statements from existing ones,
and remove that semicolon if you delete/relocate code. - Okay, I
see where you're coming from; I'm feeling with you. - But really,
we're typing tons of code and your problem is the semicolon in
cases where you restructure parts of your code? - (You may be a
candidate for using an IDE that alleviates you from such mundane
tasks.)

With your argumentation I'm curious what you think about having
to add a semicolon in "C" if you replace a {...} block.
Or, in the first place, what you think about semicolons in "C"s
'if-else' construct (with parenthesis-blocks or single statements).
And what's actually the "statement" in 'if(b)s;' and 'else s;'
and what you think about 'if(b){}else{}' being a statement (or
not, since it's lacking a semicolon).

Post by Bart
Usually you're not aware of this until the compiler tells you and you
have to go back in and fix it.

(This is obviously an issue you have; not the language. You should
have better written "Usually I'm not aware of this ...". And that's
of course a fair point [for you].)

Post by Bart
Allow semicolons to be a /terminator/, and all that goes away. It's a no
brainer.

History and also facts of contemporary languages disagree with you.
(Re: "no brainer": You need a brain to understand or know that, of
course. - So my suggestion to you is obvious; inform yourself.)

Post by Bart
But then I don't like having to write semicolons at all, and
generally I don't.

(You're a Python fan/candidate? - It might serve your needs here.)

Frankly, after the era of line-oriented languages, there's a need
in syntactically organized imperative formally specified languages
to separate the statements from each other. (Some designers used
terminators instead for "easier parsing", as they said.) Not using
semicolons at all is commonly seen in shell or scripting languages
where (again) the line termination typically fills the gap. Where I
use Shell or Awk I also avoid spurious semicolons. But that is not
the case in the "regular" ("non-scripting") programming languages.

If I have terminators, good, if I have separators, good, if the
language allows 'Empty Statements', good, if I may use 'SKIP', good,
if I can omit it, good, if my language has line-terminated commands,
fine. - I'm sure that someone who will have a strong opinion of
what's good or not in that list is doomed to implement a language
supporting his own preferences (unless choosing any existing one
that supports is anyway).

Post by Bart
The whole thing with stropping and semicolons is just a colossal
time-waster.

(I have completely different view on that but I accept your opinion.
For example I find it a "colossal time-waster" to write an own
language given the many different existing ones - some even available
in source code to continue working on an existing code base. Colossal
is here a really perfect chosen adjective. - Your scale seems to have
got impaired; you spot marginal time "wastes" and miss the real ones,
qualitatively and quantitatively.)

Post by Bart
As for better languages than C, there were very few at that level.

(But you know you can use Algol 68 on a system development level; we
can read that it had been done at those day. - All that's "missing",
and that's a good design decision, were pointers.)

[ re-iterated speed argument in comparison with "own" languages
while completely neglecting the other factors (including speed
of development process) snipped ]
It's quite unsuited to systems programming, and not just because of its
execution speed. However, I'd quite like to see A68G implemented in A68G!

I've heard and read, as I said, a differing thing about that.
Specifically I recall to have read about that special topic you
mention of writing an Algol 68 compiler in Algol 68; it has been
done.

(Your personal preferences and enthusiasm should not get in the way
of either checking the facts or formulate your opinions/thoughts as
what they are, here basically wrong assumptions based on ignorance.)

Post by Bart
Algol68 was a fascinating and refreshing language back then. It looked
great when typeset in a book. But its practicalities were annoying, and
now it is quite dated.

You are generalizing and (beyond stropping and semicolons) vague
about "its practicalities". (And the two specific design decisions
you obviously have issues with are not an Algol 68 specific thing.)

It makes me smile if you speak about "looking great when typeset",
given that the languages we use nowadays, specifically (e.g.) "C",
C++, don't even look good "when typeset". And the problems you/we
buy with that are directly observable in the languages. Rather we
seem to have accepted all their deficiencies and just work through
(or around) them. Most do that with not complaints. What I find
astonishing is that you - here known to complain about a lot of "C"
details - are now praising things (and at the same time despise
sensible concepts in an exceptionally well designed language as
Algol 68).

YMMV.

Janis

Post by Bart
[...]

Bart

2024-12-01 16:42:02 UTC

Post by Bart
Yes, but they made writing, reading and maintaining source code
impossible. [...]

Really? - For me it's exactly the opposite; having the keywords stand
out lexically (or graphically) is what adds to legibility!

In my syntax, you can write keywords in capitals if you want. It's
case-insensitive! People using my scripting language liked to capitalise
them. But now colour-highlighing is widely used.

Post by Janis Papanagnou
(I admit that hitting the Shift or the Caps-Lock key may be considered
cumbersome by [some/most] people. - I "pay the price" for legibility.)

There's a lot of Shift and Caps-Lock with writing C or C-style syntax.

Post by Bart
[ snip examples of "Bart's language" ]

(It makes no sense to compare Algol 68 with "your language"

I had to look back to see what examples I'd posted. It seems you're
refering to my backtick examples.

I was just saying that there are ways to use reserved words as
identifiers in the rare cases that are necessary. I think C# uses "@"
for example. In my case I sometimes need case-sensitive ones too.

The point is, these are exceptions; Algol68 requires every reserved
word, which includes types names, to be stropped. It gives a very
peculiar look to source code, which you see very rarely in other languages.

Post by Janis Papanagnou
that with me. - I understood that you find it a good idea to implement
an own [irrelevant] language

You keep saying that. It's a real language and has been tried and tested
over decades. Maybe it would be better if I'd just made up hypothetical
features and posted about ideas?

Post by Janis Papanagnou
(You may be a
candidate for using an IDE that alleviates you from such mundane
tasks.)

I use a syntax that alleviates me from that!

Many languages allow trailing commas in multi-line lists. The reason is
EXACTLY to simplify maintenance. But you're suggesting it is only me who
has such a problem with this stuff. Obviously others do as well.

Post by Janis Papanagnou
With your argumentation I'm curious what you think about having
to add a semicolon in "C" if you replace a {...} block.

That's just more fun and games. I don't get the rules there either.
Sometimes "};" is needed; sometimes it's not needed but is harmless;
sometimes it can cause an error.

Post by Janis Papanagnou
Or, in the first place, what you think about semicolons in "C"s
'if-else' construct (with parenthesis-blocks or single statements).
And what's actually the "statement" in 'if(b)s;' and 'else s;'
and what you think about 'if(b){}else{}' being a statement (or
not, since it's lacking a semicolon).

That's something else that Algol68 fixed, and which other languages have
copied (Lua for one).

Post by Janis Papanagnou
(This is obviously an issue you have; not the language. You should
have better written "Usually I'm not aware of this ...". And that's
of course a fair point [for you].)

The last bit of Algol68 I wrote, approaximately half my time was dealing
with ";" errors or "SKIP", or forgetting to use upper case for keywords.
Fact.

Post by Bart
Allow semicolons to be a /terminator/, and all that goes away. It's a no
brainer.

History and also facts of contemporary languages disagree with you.
(Re: "no brainer": You need a brain to understand or know that, of
course. - So my suggestion to you is obvious; inform yourself.)

Lots of languages have also done away with semicolons, or arranged
things so that they rarely need to be written.

Post by Janis Papanagnou
For example I find it a "colossal time-waster" to write an own
language given the many different existing ones

Not at the time I started doing that. Certainly not in a form that was
available to me.

So the language already exists, and I'm just evolving it.

I was going to give it up in 1992 and switch to C (I had in mind
changing jobs). Then I had another look at C - and changed my mind!

Post by Janis Papanagnou
- some even available
in source code to continue working on an existing code base. Colossal
is here a really perfect chosen adjective. - Your scale seems to have
got impaired; you spot marginal time "wastes" and miss the real ones,
qualitatively and quantitatively.)

I put a lot of weight on syntax; obviously you don't.

My syntax makes typing easier because it is case-insensitive, there is
considerably less punctuation, it's not fussy about semicolons, it
allows type-sharing more, it doesn't need separate declarations, or
headers, or ....

The end result is that less text needs to be typed, source looks cleaner
and it's less error prone. I don't need to write:

for (int index = 0; index < N; ++index)

for example. Or, to share a named entity, I don't need to write two
versions of it, one here and the other in a shared header. You don't
think that is a good thing?

So what bad language features do you think are time-wasters that I
should instead look at?

Post by Bart
It's quite unsuited to systems programming, and not just because of its
execution speed. However, I'd quite like to see A68G implemented in A68G!

I've heard and read, as I said, a differing thing about that.
Specifically I recall to have read about that special topic you
mention of writing an Algol 68 compiler in Algol 68; it has been
done.

I'm sure it has. My point about A68G is that it is interpreter, a fairly
slow one. So how fast would A68 code run under an interpreter running
under A68G?

Post by Janis Papanagnou
(Your personal preferences and enthusiasm should not get in the way
of either checking the facts or formulate your opinions/thoughts as
what they are, here basically wrong assumptions based on ignorance.)

Really? I've written countless compilers and interpreters. Mainly I
devised systems programming languages. You think I don't know my field?

IMO A68 is unsuitable for such things, and A68G doubly so.

Post by Janis Papanagnou
It makes me smile if you speak about "looking great when typeset",
given that the languages we use nowadays, specifically (e.g.) "C",
C++, don't even look good "when typeset".

Yeah. The first time I saw C code was in K&R1, in a book I bought in
1982 (for £12; a lot of money). It looked dreadful. The typeface used
made it look anaemic. That really put me off, more than the practical
problems.

I didn't consider it again until 1992 as I said, because I would have
needed a new compiler for my language to work with Windows. I ended up
writing that new compiler (and became self-employed).

Post by Janis Papanagnou
And the problems you/we
buy with that are directly observable in the languages. Rather we
seem to have accepted all their deficiencies and just work through
(or around) them. Most do that with not complaints. What I find
astonishing is that you - here known to complain about a lot of "C"
details - are now praising things (and at the same time despise
sensible concepts in an exceptionally well designed language as
Algol 68).

I admire languages that adapt and evolve. Fortran for example. C adapted
poorly and slowly. Algol68 apparently hasn't evolved at all. I guess it
couldn't do without changing it's RR, a big undertaking.

Which means it's stuck in the 1960s with some dated design choices.

BTW below is an actual example of Algol68 for A68G. It shows various
issues, other than syntax (but notice those jarring ";" after the END of
each function). You can't mix signed/unsigned arithmetic easily, it
needs BITS, which are awkward to initialise.

It is really dreadful. It makes writing in C attractive!

(Below that is my version in the Algol68-inspired syntax, but you can
see it looks quite different - and shorter.)

BTW under A68, the 100,000-loop ran in 11 seconds.

With my compiler, the 100,000,000-loop run in 1.5 seconds (1 second if
optimised). So apparently 10,000 times faster.

If I use 100,000 in my version, and get my compiler to interpreter it
(very slowly), it takes only 0.1 seconds, still 100 times faster!

(I think those long types slow down the A68 version.)

----------------------------------------------
MODE LI = LONG INT,
LB = LONG BITS;

LB mask = 8r 1 777 777 777 777 777 777 777;
LI mod = ABS mask + 1, l2 = 2;
LI t23 = l2^23, t25 = l2^25, t39 = l2^39, t41 = l2^41, t63 = l2^63;

[0:20631] LI q;

INT flag:=0;

LI carry := 36243678541,
xcng := 12367890123456;
LB xs := BIN LONG 521288629546311;

INT indx := UPB q+1;

PROC cng = LI: BEGIN
xcng *:= 6906969069 +:= 123 %*:= mod
END;

PROC xxs = LI: BEGIN
ABS ( xs := xs XOR (xs SHL 13 AND mask);
xs := xs XOR xs SHR 17;
xs := xs XOR (xs SHL 43 AND mask) )
END;

PROC supr = LI: BEGIN
LI s;
IF indx <= UPB q THEN
s:=q[indx];

indx+:=1
ELSE
s:=refill
FI;
s
END;

PROC kiss = LI: BEGIN
LI s,c,x;

s:=supr;
c:=cng;
x:=xxs;

(s+c+x) %* mod
END;

PROC refill = LI: BEGIN
FOR i FROM 0 TO UPB q DO
LI h = ABS ODD carry,
z = ABS (q[i]*t41%2 + q[i]*t39%2 + carry%2) %* mod;
carry := ABS (q[i]%t23 + q[i]%t25 + z%t63) %* mod;
q[i] := ABS NOT BIN (z*2 + h) %* mod
OD;
indx:=1;
flag+:=1;

q[0]
END;

LI x;

FOR i FROM 0 TO UPB q DO q[i] := (cng + xxs) %* mod OD;

FOR n TO 100000
DO
x := kiss
OD;

print (x)

----------------------------------------------

[0..20631]word Q
word carry = 36243678541
word xcng = 12367890123456
word xs = 521288629546311
word indx = Q.len

function refill:int =
word h,z, cy

cy:=carry
for i in Q.bounds do
h := cy iand 1
z := (Q[i]<<41)>>1 + (Q[i]<<39)>>1 + cy>>1
cy := Q[i]>>23 + Q[i]>>25 + z>>63
Q[i] := inot (z<<1+h)
od
indx:=1
carry:=cy
Q[Q.lwb]
end

macro kiss = supr() + cng + xxs

function supr:int s=
if indx <= Q.upb then
Q[indx++]
else
refill()
fi
end

macro xxs=(xs :=xs ixor xs<<13; xs :=xs ixor xs>>17; xs :=xs ixor xs<<43)

macro cng = xcng:=(6906969069) * xcng + (123)

proc main=
word x

for i in Q.bounds do
Q[i] := cng + xxs
od

to 100'000'000 do
x:=kiss
od

println "x =",x
end

James Kuyper

2024-11-30 14:10:52 UTC

On 11/29/24 22:25, Janis Papanagnou wrote:
...

No existing language meets Bart's needs as well as his own does. He
attributes this to all of the other language designers being idiots for
creating those languages, and to all the other languages' users being
idiots for not rejecting those languages. He refuses to accept the
possibility that his own preferences for language design might be
somewhat idiosyncratic.

Bart

2024-11-30 17:59:41 UTC

Post by James Kuyper
...

No existing language meets Bart's needs as well as his own does.

Post by James Kuyper
attributes this to all of the other language designers being idiots for
creating those languages, and to all the other languages' users being
idiots for not rejecting those languages.

In this group, I am only saying that my systems language does systems
programming better than C does.

It fixes lots of the problems and issues with C.

C has changed little. There might be valid reasons for that (although a
lot more could have been done in 50 years: Fortran has changed beyond
recognition in just 40).

Post by James Kuyper
He attributes this to all of the other language designers being idiots
for creating those languages

When other languages require you to write:

@import("std").debug.print("A={} B={}\n", .{a, b});

while I normally write:

println =a, =b

then you're damn right about what I think of the creator of that
language! Zig, in case you don't recognise it.

Which also, initially, refused to recognises CRLF line endings in source
files, because the creator hated Microsoft. And which still bans the use
of hard tabs.

Post by James Kuyper
He refuses to accept the
possibility that his own preferences for language design might be
somewhat idiosyncratic.

Actually, my stuff is remarkably conservative. Being simple and clean is
a characteristic.

Janis Papanagnou

2024-12-01 09:47:05 UTC

Post by James Kuyper
...

It would probably have been good - and I don't say that ironically
or anything - if Bart would either have presented his language(s)
on the market (to be part of the competition process for the best
product), or if he would have participated in the standardization
committees (to get his design ideas discussed, and then implemented
or dismissed).

Janis

Bart

2024-11-30 11:46:24 UTC

Post by Bart
Perhaps you can post a trivial bit of C code which reads in C source
code and shows the first lines of all the function definitions, not
prototypes nor function pointers. It can assume that each starts at
the beginning of a line.

By itself? Sure. Within a large very busy source file, it'll get lost in
the noise. Where is the start of each function? I don't want to analyse
each line!

A week ago somebody on reddit posted a link to a C project. The source
code was unusual: it was 'clean' for a start, but also each function
started with:

func ....

Presumably that was some empty macro; I don't recall.

But an amazing thing happened: if viewed within my editor, I could
navigate between functions with PageUp and PageDown keys. That's never
happened before with C. (Note this is /my/ crude text-mode editor.)

Normally, to do that, I'd have to use a visualisation tool to turn C
code into my syntax, then it becomes instantly clear. (Unfortunately I
can't compile it; that would need a lot more work.)

Post by Keith Thompson
I acknowledged elsewhere that I forgot about declarations of static
functions. (Hundreds of function definitions in a single source file
seem unlikely.)

I'm working with a single-file Lua implementation. It has 880 function
definitions. My SQL test file has 2100 functions.

Bart

2024-11-30 14:44:09 UTC

int foo(int);
is intended to be a function declaration,

This project:

https://github.com/PascalBeyer/Headerless-C-Compiler/tree/main/src

Although it still won't let me easily search for a specific function,
since there is still an arbitrary type between 'func' and the function name.

Janis Papanagnou

2024-11-30 03:13:21 UTC

[...]
In my language [...],
variables declared in the same declaration have 100% the same type. If
they are even 1% different, then that is a separate type and they need
their own declarations. They are no gradations!

Reminds me the Pascal days, where (depending on the Pascal dialect)
there were differences what grade of Strong Typing was implemented.
Is a homonymous type declarator the same type, or only if the same
'type' declaration is used? - was a question that had been answered
differently by Pascal implementations. (IIRC)

Janis

Janis Papanagnou

2024-11-30 03:03:39 UTC

[...]

The point is that there are restrictions on what can be combined into a
single declaration. But these days it's usually considered good style
to declare only one identifier in each declaration, [...]

For quite large values of "these days"; for more than 3 decades as far
as my observation goes. - Deviations I've mostly only seen in contexts
of, say, uninitialized loop variables, like 'int i, j, k;'. But there's
also some folks who just don't care. That's why we had that regulated
with our coding standards in these early days.

Janis

David Brown

2024-12-01 11:34:41 UTC

Post by Bart
C's syntax allows a 14-parameter function F to be declared in the same
statement as a simple int 'i'.

Post by Bart
I'd say that F and i are different types! (Actually I wouldn't even
consider F to be type, but a function.)

Neither F nor i is a type. i is an object (of type int), and F is a
function (of type int(int, int, int, int, int, int, int, int, int, int,
int, int, int, int)).

Post by Bart
That F(1, 2, 3.0, "5", "six", seven, ...) might yield the same type as
'i' is irrelevant here.

It's relevant to the syntax. i and F can be declared in the same
declaration only because the type of i and the return type of F happen
to be the same. If F returned void, i and F would have to be declared
separately.
Which, of course, is a good idea anyway.
You're posting repeatedly trying to convince everyone that C allows
ridiculous code. We already know that. You are wasting everyone's time
telling us something that we already know. Most of us just don't obsess
about it as much as you do. Most of us recognize that, however
convoluted C's declaration syntax might be, it cannot be fixed in a
language calling itself "C".
Most of us here are more interested in talking about C as it's
specified, and actually trying to understand it, than in complaining
about it.

No, the object D unambiguously has type int[3][4][5]

(So it would have a different type from E declared on in the same
int D[3][4][5], E;
? In that case tell that to David Brown!)

Yes, of course D and E have different types. I'm certain he's
aware of that.

Yes, I am. (I know you wisely dislike speaking for other people, but
you are pretty good at it!)

Post by Keith Thompson
I wrote that the object D is unambiguously of type int[3][4][5], and the
expression D can be of the array type int[3][4][5] or of the pointer
type int(*)[3][4], depending on the context. Do you agree? Or do you
still claim that D can have any of "4 or 5 possible types"?
(Note that I'm not talking about the type of the expression D[i] or of
any other expression that includes D as a subexpression.)

David apparently has a different definition of "totally different types"
than you do. Since the standard doesn't define that phrase, I suggest
not wasting time arguing about it.

"int", "void" and "double" are totally different types in my view.
"int", "pointer to int", "array of int", "function returning int" all
have a relation that means I would not describe them as /totally/
different types - though I would obviously still call them /different/
types.

The syntax of C allows one declaration statement to declare multiple
identifiers of types related in this way - it does not allow declaration
of types of /totally/ different types.

That was the point I was trying, and clearly failing, to explain to Bart.

Post by Keith Thompson
int D[3][4][5], E;
the object D is of type int[3][4][5], and E is of type int. Do you
understand that?
If you wanted to change the type of D from int[3][4][5] to
double[3][4][5], you'd have to use two separate declarations.
Do you understand that? (Of course you do, but will you admit that
you understand it?)
I think that distinction is what David had in mind. double[3][4][5] and
int are "totally different types", but int[3][4][5] and int are not.
Entities of "totally different types" cannot be declared in a single
declaration. You don't have to accept that meaning of the phrase (which
I find a bit vague), but it's clearly what David meant.

It is certainly a vague term - there is no well-defined difference
between "totally different types" and "different types". But since Bart
specifically called them /totally/ different types, the only way I could
interpret that is suggesting that they could be any types at all. And
as we all know (even Bart, though he seems determined to feign
ignorance), multiple identifiers in the same declaration cannot be of
completely independent types.

Post by Keith Thompson
The point is that there are restrictions on what can be combined into a
single declaration. But these days it's usually considered good style
int i, *p;
int i;
int *p;
is preferred by most C programmers.
Do you understand that?

I was trying to explain that this the principle C syntax uses - "A" and
"D" have different types, but expressions of the same format as used in
the common declaration have a common type.

Post by Bart
But of course they may be evaluated only partially, yielding a range
of types.

What "range of types" do you think D can have?

Post by Keith Thompson
Would you write "const int F();"? Or would you omit the "const"? How
does the fact that "const" is allowed inconvenience you?

It's another point of confusion. In my language I don't treat function
declarations like variable declarations. A function is not a
variable. There is no data storage associated with it.

In C, declarations can declare objects, functions, types, etc. I fail
to see how your language is relevant.

Post by Bart
In C it is unfortunate, as it makes it hard to trivially distinguish a
function declaration (or the start of a function definition) from a
variable declaration.

It's not as hard as you insist on pretending it is. A function
declaration includes a pair of parentheses, either empty or
containing a list of parameters or parameter types.
Function declarations outside header files are valid, but tend to be
rare in well-written C code.

A function definition - as typically written - is also a function
declaration. So presumably you mean non-defining declaration here.

Some people have a style where they write forward declarations of all
functions defined in a C file near the top of the file. I am not a fan
of that myself - especially as over time, this redundant information is
rarely kept fully in sync with the rest of the code. But it is
definitely something you'll sometimes see in real-world C code. (You
could argue that the code is then not "well-written" C, but that would
be a very subjective opinion.)

Bart

2024-12-01 12:03:32 UTC

Post by David Brown
"int", "void" and "double" are totally different types in my view.
"int", "pointer to int", "array of int", "function returning int" all
have a relation that means I would not describe them as /totally/
different types - though I would obviously still call them /different/
types.

What about 'array of int', 'array of double' and 'array of void*'; do
they have a relation too?

Your examples, expressed left-to-right, happen to share the last
element. They could also share other elements; so what?

Post by Keith Thompson
Function declarations outside header files are valid, but tend to be
rare in well-written C code.

That's a separate problem. But without forward declarations, at some
point you're going to add some expression in the middle of the file, but
find you're calling a function which is declared later on in the file
rather than earlier.

That's not something you want to waste time thinking about.

When I first wrote a big C program, I used a 300-line script to convert
a thinly syntax-wrapped version of C, into actual C.

The script also generated an include file of all static function
declarations (which was included in this module) and a separate file of
exported function declarations.

Everything was automatically kept in sync.

Now, why can't the language do that?

David Brown

2024-12-01 14:34:04 UTC

What about 'array of int', 'array of double' and 'array of void*'; do
they have a relation too?

In a different context, yes - they can all be used in expressions of the
form "a[i]". Different context, different relationship.

Post by Bart
Your examples, expressed left-to-right, happen to share the last
element. They could also share other elements; so what?

All I was saying is that it is wrong to say a single declaration can
have identifiers of /totally/ different types - the types you are
allowed to use in one declaration have a very clear relationship with
each other.

Post by Keith Thompson
Function declarations outside header files are valid, but tend to be
rare in well-written C code.

It's not something I find to be a noticeable issue. I prefer, on the
whole, to avoid forward declarations - I like to order by code so that
functions are defined before they are called. It makes it easier to see
the structure of the code and call trees. There are exceptions when I
feel a different order is clearest, but those are rare and I can put up
with writing static function declarations in those cases. (For
non-static functions, there is invariably a declaration in a matching
header file.)

I can see some advantages in a language being happy with any order of
function definition, without requiring forward declarations to use a
function before it is defined. But C is not like that, and I cannot
honestly say it bothers me one way or the other. And apparently, it
does not particularly bother many people - there is, I think, no serious
impediment or backwards compatibility issue that would prevent C being
changed in this way. Yet no one has felt the need for it - at least not
strongly enough to fight for it going in the standard or being a common
compiler extension.

Michael S

2024-12-01 16:57:40 UTC

On Sun, 1 Dec 2024 15:34:04 +0100

Post by David Brown
I can see some advantages in a language being happy with any order of
function definition, without requiring forward declarations to use a
function before it is defined. But C is not like that, and I cannot
honestly say it bothers me one way or the other. And apparently, it
does not particularly bother many people - there is, I think, no
serious impediment or backwards compatibility issue that would
prevent C being changed in this way. Yet no one has felt the need
for it - at least not strongly enough to fight for it going in the
standard or being a common compiler extension.

I think, arguing in favor of such change would be easier on top of
the changes made in C23.
Before C23 there were, as you put it "no serious impediment or
backwards compatibility issue". After C23 we could more categorical
claim that there are no new issues.

David Brown

2024-12-01 18:33:08 UTC

Post by Michael S
On Sun, 1 Dec 2024 15:34:04 +0100

Does that mean there was something that you think was allowed in C
before C23, but not after C23, that would potentially be a problem here?

What, specifically, are you thinking of?

The changes in declarations in C23, AFAIK, are that "void foo();" now
means the same as "void foo(void);", and that in general non-prototype
function declarations are no longer allowed. (That is a good change, of
course - 30 years late, but still welcome.) But I can't think of a
situation where code that is correct under the current "declare before
using" rule would no longer be correct if "declare before or after" were
allowed at the file-scope level.

Maybe you could construct something by playing silly-buggers with macros
to re-define identifiers at different points in the code.

I can't say I've thought much about the consequences of such a rule
change - it doesn't seem a realistic change to me, and I have no problem
with the current rule. So it's quite possible that I've missed something.

Keith Thompson

2024-12-01 22:04:17 UTC

[...]

Post by Keith Thompson
David apparently has a different definition of "totally different
types" than you do. Since the standard doesn't define that phrase, I
suggest not wasting time arguing about it.

The following is not intended as a criticism. I think "totally
different types" was a throwaway phrase applying to a specific context,
and I have no problem with that. It refers to types that cannot be
combined in a single declaration. I don't think its a suitable term for
more general use (and I'm sure it wasn't intended to be). There might
be a clearer phrase, but I don't really think we need a term for it at
all (outside that one throwaway context).

If we needed a definition, we could refer to the discussion of "derived
types" in section 6.2.5 (Types) of the C standard.

And you explained it clearly enough when you first used it.

[...]

Post by David Brown
A function definition - as typically written - is also a function
declaration. So presumably you mean non-defining declaration here.

Yes.

Post by David Brown
Some people have a style where they write forward declarations of all
functions defined in a C file near the top of the file. I am not a
fan of that myself - especially as over time, this redundant
information is rarely kept fully in sync with the rest of the code.
But it is definitely something you'll sometimes see in real-world C
code. (You could argue that the code is then not "well-written" C,
but that would be a very subjective opinion.)

Yes, that was an oversight on my part.

If someone wanted to ensure that all static functions defined in a
translation unit are declared near the top, there could be a separate
tool to generate, or at least check, the declarations. I'm not aware of
any such tool, which suggests there probably isn't much demand for it.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

David Brown

2024-11-30 15:57:41 UTC

Post by Michael S
On Fri, 29 Nov 2024 13:33:30 +0000

Post by Bart
* It allows a list of variable names in the same declaration to each
have their own modifiers, so each can be a totally different type

They can't have "totally different" types - they can have added
indirection or array indicators, following C's philosophy of
int x, *y, z[10];
Thus "x", "*y" and "z[i]" are all of type "int".

C's syntax allows a 14-parameter function F to be declared in the same
statement as a simple int 'i'.

And the laws of physics allow me to drop a 20 kg dumbbell on my toe.
That does not mean that anyone thinks it is a good idea.

Post by Bart
I'd say that F and i are different types! (Actually I wouldn't even
consider F to be type, but a function.)

Functions have types in most typed languages, including C.

And yes, F and i are different types - but they are related types. Use
the declared identifier in an expression of a form matching what you
wrote in the declaration, and the expression will have type "int".
That's how C's declarations work.

Post by Bart
That F(1, 2, 3.0, "5", "six", seven, ...) might yield the same type as
'i' is irrelevant here.

No, it is exactly the point - it is how C is defined.

Post by Bart
int A[100]
int *B;
int (*C)();
people would consider the types of A, B and C to be array, pointer and
function pointer respectively. Otherwise, which of the 4 or 5 possible

They are different - but related - types.

Post by Bart
int D[3][4][5];
It depends on how it is used in an expression, which can be any of &D,
D, D[i], D[i][j], D[i][j][k], none of which include 'Array' type!
const int F();
why is 'const' allowed here? There is no storage involved. It's not as
though you could write 'F = 0' is there was no 'const'.

"const" is allowed here because it is part of the type returned by F().

Really, most of this is pretty straightforward. No one is asking you to
/like/ the rules of C's declarations (I personally dislike that a single
declaration can be used for different types, even if they are related).
But /please/ stop pretending it's difficult to understand.

Post by David Brown
C allows this, but I personally would be happier if it did not. As
Michael says below, most serious programmers don't write such code.

It doesn't matter. If you're implementing the language, you need to
allow it.

I am not implementing the language. No one else here is implementing
it. You have, apparently, implemented at least some of the language
while being completely incapable of understanding it. I am surprised
that is possible, but it is what you seem to be claiming.

Post by Bart
If trying to figure out why some people have trouble understanding, it's
something to consider.

I have met a good many C programmers over the years, and some of them
have had misunderstandings. I have never heard of anyone who comes
close to your level, however.

For most people new to C, it's enough to tell them that "int* a, b;"
declares "a" as a "pointer to int" and "b" as an "int". You tell them
it is a bad idea to write such code, even re-arranged as "int *a, b;",
because it is easy to get wrong - they should split the line into two
declarations (preferably with initialisations). The C newbie will thank
you for the lesson, and move on to write C code without writing such
mixed declarations.

/That/ is how you solve problems with syntax that can be abused to write
unclear code.

Post by Bart
It's also something to keep in mind if trying to understand somebody
else's code: are they making use of that feature or not?
So this is a wider view that just dismissing design misfeatures just
because you personally won't use them.

So what is your preferred solution? Whine endlessly on newsgroups for
decades on end about the same things you dislike, to people who have
absolutely no influence on the thing that bothers you? How productive
has that been for you?

The sane response to a "design misfeature" is to avoid using it
yourself, and encourage others to stop using it when you see them doing so.

Post by Bart
With the kind of C I would write, you could discard everything after
C99, and even half of C99, because the subset I personally use is very
conservative.

You say that as though you think it is a good thing - it is not.

Bart

2024-11-30 17:38:19 UTC

Post by Bart
C's syntax allows a 14-parameter function F to be declared in the same
statement as a simple int 'i'.

And the laws of physics allow me to drop a 20 kg dumbbell on my toe.
That does not mean that anyone thinks it is a good idea.

Who said it's a good idea? I merely said that C allows such disparate
types in declarations. You disagree that they are different types, while
at the same time saying it's a bad idea to mix them in the same declaration!

Post by Bart
I'd say that F and i are different types! (Actually I wouldn't even
consider F to be type, but a function.)

Functions have types in most typed languages, including C.
And yes, F and i are different types - but they are related types. Use
the declared identifier in an expression of a form matching what you
wrote in the declaration, and the expression will have type "int".
That's how C's declarations work.

That's not how people's minds work. If you declare A, B, and C, then
what is important is the types of A, B, and C, not what might yielded as
they result of some expression.

People can write A in an expression using all the modifiers specified
for it, or they can write with none or some of the modifiers, or they
might write &A; generally they will yield different types.

If I write this

int *A, B[10], C(int);

My compiler tells me that:

A is a local variable with type 'ref i32' (expressed in other syntax)
B is a local variable with type '[10]i32'
C is a function with return type of 'i32', taking one unnamed
parameter of type 'i32'.

(Interestingly, it places C into module scope, so the same declaration
can also create names in different scopes!)

However writing:

A; B; C;

creates expressions with types 'ref i32', 'ref i32', and 'ref
proc(i32)i32' according to C rules. All quite different from your claim
that they all yield 'int' (while C makes the scalar A and array B the
same type!).

So the key types are what are specified in the symbol table. And those
are different and incompatible.

BTW here are A, B, C in my syntax:

ref i32 A
[10]i32 B

proc(i32)i32 C

(The last is a function declaration, which only exist for FFI functions;
it can only appear in an 'importdll' block.)

I don't think anyone outside the C world would consider even A and B as
having related types - int vs array - let alone C. Yet you are claiming
this.

Post by David Brown
Really, most of this is pretty straightforward. No one is asking you to
/like/ the rules of C's declarations (I personally dislike that a single
declaration can be used for different types, even if they are related).
But /please/ stop pretending it's difficult to understand.

I keep saying that my language is easier than this with simpler rules.
That makes C harder. It also makes it more errorprone. It can make it
more confusing. That is just the truth.

Post by David Brown
C allows this, but I personally would be happier if it did not. As
Michael says below, most serious programmers don't write such code.

It doesn't matter. If you're implementing the language, you need to
allow it.

I am not implementing the language. No one else here is implementing
it. You have, apparently, implemented at least some of the language
while being completely incapable of understanding it.

And you seem utterly incapable of understanding that A implementing some
complicated and badly designed system X doesn't mean that A becomes an
expert in using X, or suddenly thinks of it as simple and well-designed.

Quite the opposite in fact, since they get to see the nitty-gritty
details, and especially where people have papered over the cracks.

They will also get to see a vast amount of badly written code that
breaks all the rules, compared to B who not only uses just their
prefered subset of X, but additional systems Y and Z to hide some of the
problems.

Post by David Brown
/That/ is how you solve problems with syntax that can be abused to write
unclear code.

You solve it by fixing the language. If you can't fix the language then
you use a strong workaround, such as a new syntax wrapper around it.

Post by David Brown
The C newbie will thank
you for the lesson, and move on to write C code without writing such
mixed declarations.

You can't give such a lesson to everyone, so there will still be a
million programmers who pick up bad habits, not helped by the tools they
use being lax.

Post by Bart
With the kind of C I would write, you could discard everything after
C99, and even half of C99, because the subset I personally use is very
conservative.

You say that as though you think it is a good thing - it is not.

Why?

I reckon people will have an easier type understanding and working with
my code than yours. It will at least work with more compiles.

Bart

2024-11-30 20:17:35 UTC

Post by Bart
If I write this
int *A, B[10], C(int);
A; B; C;
creates expressions with types 'ref i32', 'ref i32', and 'ref
proc(i32)i32' according to C rules.
   ref i32 A
   [10]i32 B
   proc(i32)i32 C
(The last is a function declaration, which only exist for FFI functions;
it can only appear in an 'importdll' block.)

Although nobody here will care in the slightest, that last is the wrong
syntax and irks me. It wasn't fully converted from the function pointer
version earlier. The proper syntax would be:

func C(i32)i32

The name is placed earlier. This helps draw a firmer line between
functions and variables.

David Brown

2024-12-01 14:49:42 UTC

Post by Bart
C's syntax allows a 14-parameter function F to be declared in the
same statement as a simple int 'i'.

And the laws of physics allow me to drop a 20 kg dumbbell on my toe.
That does not mean that anyone thinks it is a good idea.

So if we all agree that it's a bad idea, and no one does it, why is it
such a problem for you?

Post by Bart
I'd say that F and i are different types! (Actually I wouldn't even
consider F to be type, but a function.)

Functions have types in most typed languages, including C.
And yes, F and i are different types - but they are related types.
Use the declared identifier in an expression of a form matching what
you wrote in the declaration, and the expression will have type "int".
That's how C's declarations work.

That's not how people's minds work.

Are you extrapolating from how /you/ think, to how everyone else (or at
least, every other C programmer) thinks? Given the negligible support
you have here for most (though not all) of your multitudes of pet hates
in C, I think that's a rather bold approach.

Post by Bart
If you declare A, B, and C, then
what is important is the types of A, B, and C, not what might yielded as
they result of some expression.

Why do you think that? After all, A, B and C are going to be used
primarily in those types of expression. If "A" is an array, you will
use it as "A[i]" in the majority of cases - and care about the type of
"A[i]" more than the type of "A".

Note - in case you missed it - that personally I would prefer if C did
not allow declaring identifiers of different types together. And I
would probably have preferred a declaration syntax where the identifier
comes either fully before, or fully after, all type-related symbols.

But C is defined the way it is defined, and with good logical
justification. Most C programmers find it helpful to understand that.

Post by Bart
With the kind of C I would write, you could discard everything after
C99, and even half of C99, because the subset I personally use is
very conservative.

You say that as though you think it is a good thing - it is not.

Why?
I reckon people will have an easier type understanding and working with
my code than yours.

You are joking, right? If you are not lying about how confusing you
find C and how error-prone you think it is, then your C code works by
luck. And that is not something people are going to find easy to
understand. (And if you are talking about your own language, then no
one else understands it.)

Post by Bart
It will at least work with more compiles.

And why would that matter? No actual developer would care if their code
can be compiled by your little toy compiler, or even more complete
little tools like tcc. Code needs to work on the compilers that are
suitable for the job - compatibility with anything else would just be a
waste of effort and missing out on useful features that makes the code
better.

Janis Papanagnou

2024-12-01 09:55:30 UTC

Post by David Brown
For most people new to C, it's enough to tell them that "int* a, b;"
declares "a" as a "pointer to int" and "b" as an "int". You tell them
it is a bad idea to write such code, even re-arranged as "int *a, b;",
because it is easy to get wrong - they should split the line into two
declarations (preferably with initialisations). The C newbie will thank
you for the lesson, and move on to write C code without writing such
mixed declarations.

That's why the newbies are preferred for programming; they are flexible,
cheap, and they do what they are told. ;-) (Sorry, could not resist.)

Janis

Tim Rentsch

2024-11-30 17:54:30 UTC

Post by Michael S
IMHO, any way to mix more than one 'modifier' (not in C standard
meaning of the word, but in more general meaning) is potentially
confusing. It does not matter whether modifier is 'const' or '*'
or [] or ().

It surprises me that you would say this. Certainly there are type
forms that might be difficult to absorb (e.g., 'float *********')
but that doesn't mean they are necessarily confusing. There are two
obvious ways to write type forms that are easy to decode. One way
is to write any derivings right-to-left:

[] * (double,double) * float

which can be read directly as "array of pointer to function that
returns a pointer to float", and the other way is simply the reversal
of that:

float * (double,double) * []

which can be read right-to-left the same way. The constructors for
derived types (pointer, array, function) act like nouns. Qualifiers
such as const or volatile act like adjectives and always go to the
left of the noun they modify, so for example

[] volatile* float

is an array of volatile pointer to float, or in the other ordering

float volatile* []

which is simply a reversal of noun phrases, with any modifying
adjectives staying on the left side of the noun they modify.

The syntax used in C is harder to read for two reasons: one, the
ordering of derivations is both left-to-right and right-to-left,
depending on what derivation is being applied; and two, any
identifier being declared goes in the middle of the type rather
than at one of the ends. Both of those confusions can be removed
simply by using a consistent ordering, either left-to-right or
right-to-left (with qualifying adjectives always on the left of
the noun they modify).

Note that both of the consistent orderings correspond directly to a
natural English wording, which accounts for them being easier to
comprehend than C-style type forms. (I conjecture that some foreign
languages might not have that property, but since I am for the most
part ignorant of essentially all natural languages other than
English I have no more to say about that.)

Michael S

2024-12-01 16:47:17 UTC

On Sat, 30 Nov 2024 09:54:30 -0800

Post by Tim Rentsch

That's nice. It's a pity it will never be adapted in C.

Post by Tim Rentsch
The syntax used in C is harder to read for two reasons: one, the
ordering of derivations is both left-to-right and right-to-left,
depending on what derivation is being applied; and two, any
identifier being declared goes in the middle of the type rather
than at one of the ends. Both of those confusions can be removed
simply by using a consistent ordering, either left-to-right or
right-to-left (with qualifying adjectives always on the left of
the noun they modify).
Note that both of the consistent orderings correspond directly to a
natural English wording, which accounts for them being easier to
comprehend than C-style type forms. (I conjecture that some foreign
languages might not have that property, but since I am for the most
part ignorant of essentially all natural languages other than
English I have no more to say about that.)

As you can see, in a reply to my post Bart already suggested that the
problematic part is not a chain itself, but mixture of terms that has
to be read left-to-right with those has to be read right-to-left.
You can also see that I already mostly agreed with him.
Mostly rather than completely, because I think that when a single
declaration has more than half a dozen terms it is difficult to
understand even when it does not change direction in the middle.

But my main disagreement with Bart was orthogonal to this discussion.
It was about 'const' not being any harder to understand than other
"right-to-left' type modifiers.

David Brown

2024-11-29 16:34:01 UTC

[...]

Post by Bart
I think 'const' is confusing for similar reasons that VLAs can be both
confusing and awkward to implement.
That's because both really apply to /types/, not directly to variables.

Sure. For example, given
const int n = 42;
n is of type `const int`, and &n is of type `consts int*`. Of course
that implies that n itself is const.

   T x;           // defines a readonly variable (which probably needs
                  // initialising)
   T* y;          // defines a variable pointer
'const' is out of the picture.

You say T is an alias (what, a macro?) for 'const int', you show code
using T, and then you say "'const' is out of the picture". If you have
a point, it escapes me.

Well, can you see 'const' in my example? You can't tell x is readonly
by only looking at this.

Post by Keith Thompson
Yes, and you seem determines to make it easier to get mixed up.

That one is really simple - clearly "x" is declared "const", and so
you can't assign to it later.

Post by Bart
T const y;

That one is equally simple - clearly "y" is declared "const", and so
you can't assign to it later.
That shows exactly why it can be a good idea to use "typedef", even
for relatively simple things such as adding a pointer or qualifiers to
the type. I would not normally make a typedef just for a pointer, or
just to add a "const" qualifier, but if the final type is complicated,
it can make things clearer. Really, it's just like breaking a
complicated expression into parts with extra local variables to name
them.

Post by Bart
int* const z;

This one requires a little more thought, but it should be well within
the capacity of any C programmer to see that "z" is declared "const".

    int * const z1;
    int const * z2;
    z1=0;          // invalid
    z2=0;          // valid

Yes. This is C course lesson 1 stuff.

The type here comes in two parts - the pointer, and the thing being
pointed at (the "pointee", if that's a real word). So there are two
parts that could, independently, be qualified as "const" - thus it
matters whether the "const" is before or after the asterisk.

I don't disagree that there are other ways to make a language syntax to
specify types, or that some alternatives might be easier to follow than
C (such as always left-to-right, or always right-to-left).

But cases like this are not hard, and it is not at all unreasonable to
expect anyone who calls themselves a "C programmer" to handle those
examples without a moment's thought.

And if you get it muddled, the chances are very high that the compiler
will tell you when you try to compile the code (any compiler will
complain about assigning to a const, not just ones with good static
error checking).

Post by Bart
     x=0;
     y=0;
     z=0;
because I thought would behave differently, with 'const' being the
opposite side of '*' to the base-type.

The "const" in each case clearly applies to the type of the declared
variable.

Post by Bart
I forgot that here it would be the right-most 'const' that controls
storage attributes of 'z'.

You invented a rule (that the right-most "const" controls the variable),
and it turned out to be wrong. In hindsight, that should be totally
obvious to you - if your rule had been true, there would be no way to
specify a non-const pointer to const data, since the first "const" would
also be the last one.

Instead, try to learn the rules of C, rather than making up incorrect
rules and confusing yourself.

Post by Bart
You will of course say that I'm the only person in the world who
could make that mistake.

C makes "const" part of the type - it is a type qualifier. It is also
possible for a language to make "constness" an attribute of the variable
but not of the type. The C language makes the design decision that
variables don't get extra attributes - qualifiers like "const" and
"volatile" must therefore by part of the type. That simplifies many
things in the language. (That design decision is independent of the
syntax used to specify types - it would apply even if C had used, say,
"const * (const int)" to specify "const pointer to type const int".)

Some languages do have declaration keywords that determine the constness
of the declared variable, yes. I don't know if the constness of the
variable is part of the variable's type or an extra attribute or
characteristic - that will depend on the language in question. If the
language allows you to specify arbitrarily complex pointer types and
their constness, you'd still need something equivalent to "const" and
"mutable" in multiple places - the initial keyword only covers the
outermost case.

But C does not have any keywords involved in declaring variables. So
having two keywords to distinguish "const" makes no sense for C. It
would also not work well, given that C has three type qualifiers -
"const", "volatile" and "_Atomic". ("restrict" is syntactically a type
qualifier, but is a bit different and best ignored here.) Would you
want to have other keywords for declaring "volatile", "volatile const",
and _Atomic combinations?

Having said that, I think it is nice for a language to be a little bit
more verbose and explicit than C, and I would prefer if there were
keywords for declaring and defining variables (and functions). Having
separate keywords for defining const and non-const variables would be a
complication to the language and rules, since it would have to come in
addition to supporting these traits in pointed-to types, but it would be
convenient in use - most variables, after all, are not pointers.

I suggested that C is confusing because 'const' looks as though it's
like the former, but it's part of the latter. Which also means you can
have multiple 'const' in a declaration (putting asided repeated 'consts').

You find C confusing because you are determined to find it confusing,
and are willing to invent incorrect "rules" to boost your confusion.

Yes, it is possible to write types in C that are hard to comprehend. It
is, however, easy to avoid doing so in C - just as it is possible to
write code that is hard to comprehend in any other language.

So objectively, it IS more complicated than elsewhere with more scope
for getting it wrong.

There is nothing objective in what you have written. We all understand
your /subjective/ opinion. Some people (occasionally even me) may agree
with some of your opinions - that does not make them objective.

But of course, this group being what it is, people have to turn it round
to make it about me: I'm deliberately trying to show confusing examples.

I would be disappointed if those were the most "confusing" examples you
could come up with!

Or I'm to thick to understand how const works.

I don't think you are too "thick" here. I think you are applying
considerable intelligence and thought into making it /appear/ that C's
syntax is difficult. You need only use a tiny fraction of that
intelligence to understand the C rules for type specifications well
enough to handle "int * const z1" without effort. (We all agree that
long and complex multi-part types are harder to follow.)

If you ran a business selling tools for your own languages, this would
all make sense. Since you don't do that, the biggest source of
confusion here is what motivation you have for your campaign of invented
FUD here.

Keith Thompson

2024-11-28 20:33:21 UTC

I think your comment applies for const in declarations like
const int i = 1;
I used to find const confusing, as it sometimes meant 'read-only' and
other times 'immutable.'

I'm not sure what you mean. My understanding is that const means
read-only, and nothing else.

We are *defining* the object `i`, which means that the declaration
(which is also a definition) causes storage to be allocated.

Post by Thiago Adams
But here
void f(const struct X * p);
We are not declaring the storage for the pointed object.

Right. But "const" means the same thing: the object `*p` is read-only.
More precisely, the expression `*p` gives us read-only access to that
object; there might be other expressions that give read/write access to
the same object.

If an object is const because of its definition, then that object is
itself read-only, and anything that bypasses that (pointer casts, etc.)
causes undefined behavior. If "const" appears in a declaration that
isn't a definition for the object, then the declaration provides a
read-only view of the object (if it exists). The object itself may or
may not be read-only.

int n = 42; // read/write
void func(const int *param); // *param provides a read-only view
func(&n);
// The object can be modified via the name "n", but not via
// the name "*param".

Post by Thiago Adams
So, for the first case, we can think const as declaring a immutable
storage, while for the second sample const acts as "read-only" - we
don't know if the storage is const or not.

"const storage" is an implementation detail, not part of C semantics.
A conforming implementation could put everything in writable storage
(perhaps the OS doesn't provide memory protection, or the compiler
authors are lazy), relying on C semantics to prevent writes to
const objects.

Post by Thiago Adams
Now, it seems less confusing to me. When const is used with variables
that can be initialized (init-declarator), it acts as 'immutable',
meaning the storage is constant.

What exactly do you mean by "the storage is constant"? Are you
talking about memory that is marked as read-only by the OS?

Sure, you can think of it that way, but it's not what "const"
*means*. Something like `*(int*)&i = 42;` has undefined behavior,
regardless of the implementation. If the implementation chooses
to store i in some kind of read-only memory (perhaps enforced by
the OS, perhaps physical RAM), it's likely to crash. If it stores
it in ordinary read/write memory, it will likely store 42 in i.
Reading i might yield 1 or 42, depending on optimizations (remember
that it's UB).

Post by Thiago Adams
But for local variables it does not make sense to have "read-only
marked memory" because it lives on stack.
int main(){
const int i = 1;
}

There's no fundamental reason an implementation couldn't have
"read-only marked memory" on the stack.

[...]

Post by Thiago Adams
const is very context dependent, maybe trying to reuse the same
keyword, and I think C23 had a change to clarify it, but instead make
it more confusing with constexpr, that was the point of my previous
topic.

const and constexpr are two very different things.

Do you have an example where "const means read-only" isn't enough to
understand the C semantics (leaving aside any implementation-specific
choices)?

(But note that given :
constexpr int n = 42;
`&n` is of type `const int*`. Which makes sense; we don't want `&n` to
give us write access to n.)

"const" does create a lot of opportunities for optimizations, like
initializing objects at load time rather than during execution. And a
clever compiler could implement the same optimizations if it's able to
prove that an object is never modified.

Post by Thiago Adams
For compile that computation what matters is the guarantee that the
compiler knows the values (it knows because it always the same value
of initialization) when using the object. (It does not depend on flow
analysis)

There's no requirement for the compiler to "know" the value of a const
object.

Post by Thiago Adams
I think const, like in here
const int i = 1;
gives the same guarantee. (The compiler knows the value of i)

That's a very common optimization, but a conforming compiler could
simply read the value stored in `i` on every reference to it.

Post by Thiago Adams
What I think could be explored more is the usage of register keyword
as meaning "no-storage".
The idea of const no-storage is good because it eliminates any problem
with object lifetime and it makes the perfect constants in my
view. Unfortunately, constexpr does not mean that because we can take
the address of constexpr object.

The fact that constexpr objects still have addresses is perhaps
a bit odd, but that's how C23 defines it. I like the idea of a
constexpr object with no associated storage, so that

constexpr int the_answer = 42;

does nothing more than make "the_answer" a name for the value 42,
but C23 doesn't have that. And I'm not sure it's all that important;
if we never refer to the address of the_answer, the compiler is
free to eliminate its storage. For most purposes, we can just
ignore the fact that a constexpr object has storage.

Post by Thiago Adams
Sample why no-storage is useful
void F()
{
register const int i = 1;
//lets way we have lanbdas in C
f( []()
{
//safe to use i even in another thread, or even after exiting F
int k = i;
}
);
}

Without "register", since i is const, its value will never change
(barring undefined behavior), so it should be safe to use anyway.
How is eliminating the storage for i useful? You can just ignore
it, and the compiler may be able to optimize it away.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Thiago Adams

2024-11-29 12:30:37 UTC

Post by Keith Thompson
If an object is const because of its definition, then that object is
itself read-only, and anything that bypasses that (pointer casts, etc.)
causes undefined behavior.

Yes. This is my point. When the object itself cannot change, I used the
name immutable. And "ready only" when we don´t know if the object is
immutable or not - like pointer to const object.
In any, case it is just a matter of definition. I think it is clear for
both of us. (I am not claiming these definitions are part of C standard)

There's no requirement for the compiler to "know" the value of a const
object.

When the expression is required to be constant expression like in switch
case, then the compiler must know the value.

Sorry if I am begin repetitive, but here is my motivation to say that
const was already ready for that, no need to new keyword constexpr.

Consider this sample

void f(const int a)
{
const int b = 1;

switch (a){
case a: break; // OPS
case b: break; // should be OK
}
}

The compiler does not know the value of 'a' even it is declared as
constant object; on the other hand the compiler knows the value of 'b';

So, here is my point - In init-declarators. const and constexpr becomes
similar.

If lambdas were implemented in C, a decision has to be made about
capture. Is it allowed or not?
Objects with no storage could be allowed to be captured because this
will never imply in lifetime problems; for instance if the lambdas is
called by another thread.

C23 also added constexpr in compound literal.

(constexpr struct X ){ }

I also don´t understand why not just use const for that.

I also allow static.

(static const struct X ){ }

I think in this case it makes sense.

Keith Thompson

2024-11-29 20:53:27 UTC

Post by Keith Thompson
If an object is const because of its definition, then that object is
itself read-only, and anything that bypasses that (pointer casts, etc.)
causes undefined behavior.

Yes. This is my point. When the object itself cannot change, I used
the name immutable. And "ready only" when we don´t know if the object
is immutable or not - like pointer to const object.
In any, case it is just a matter of definition. I think it is clear
for both of us. (I am not claiming these definitions are part of C
standard)

There's no requirement for the compiler to "know" the value of a const
object.

When the expression is required to be constant expression like in
switch case, then the compiler must know the value.

True, but we weren't talking about constant expressions. We were
talking about objects of const type. Despite the obvious similarity of
the words "const" and "constant", they're really two different things.

And the name of an object (in the absence of constexpr) can't be a
constant expression. Given `const int zero = 0;`, the name `zero` is
not a constant expression and cannot be used in a case label. (There
are proposals to change that.)

Post by Thiago Adams
Sorry if I am begin repetitive, but here is my motivation to say that
const was already ready for that, no need to new keyword constexpr.
Consider this sample
void f(const int a)
{
const int b = 1;
switch (a){
case a: break; // OPS
case b: break; // should be OK
}
}
The compiler does not know the value of 'a' even it is declared as
constant object; on the other hand the compiler knows the value of 'b';
So, here is my point - In init-declarators. const and constexpr
becomes similar.

That has been proposed, but I personally oppose it, and the language (as
of C23) doesn't support it. (C++ does.)

Given:

const int b = initializer;

the expression `b` would be a constant expression if and only if the
initializer is a constant expression (optionally enclosed in {...}, I
suppose). It's not always obvious whether an expression is constant or
not; you have to examine everything it refers to. And if you intend b
to be a constant expression but accidentally write a non-constant
initializer, it's still a perfectly valid declaration of a read-only
object initialized with the result of evaluating a run-time expression;
the error won't be flagged until you try to use it.

Recall that `const int r = rand();` is still perfectly valid.

But given:

constexpr int b = initializer;

you *know* that b can be used as a constant expression, and if the
initializer is not constant the compiler will flag it immediately.

This is already in C23. Dropping constexpr is politically impossible at
this point.

[...]

Post by Thiago Adams
C23 also added constexpr in compound literal.
(constexpr struct X ){ }
I also don´t understand why not just use const for that.

Because constexpr and const mean different things.

Post by Thiago Adams
I also allow static.
(static const struct X ){ }
I think in this case it makes sense.

That's valid in C23.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Thiago Adams

2024-11-30 12:31:04 UTC

Post by Keith Thompson
If an object is const because of its definition, then that object is
itself read-only, and anything that bypasses that (pointer casts, etc.)
causes undefined behavior.

Yes. This is my point. When the object itself cannot change, I used
the name immutable. And "ready only" when we don´t know if the object
is immutable or not - like pointer to const object.
In any, case it is just a matter of definition. I think it is clear
for both of us. (I am not claiming these definitions are part of C
standard)

There's no requirement for the compiler to "know" the value of a const
object.

When the expression is required to be constant expression like in
switch case, then the compiler must know the value.

True, but we weren't talking about constant expressions. We were
talking about objects of const type. Despite the obvious similarity of
the words "const" and "constant", they're really two different things.
And the name of an object (in the absence of constexpr) can't be a
constant expression. Given `const int zero = 0;`, the name `zero` is
not a constant expression and cannot be used in a case label. (There
are proposals to change that.)

That has been proposed, but I personally oppose it, and the language (as
of C23) doesn't support it. (C++ does.)
const int b = initializer;
the expression `b` would be a constant expression if and only if the
initializer is a constant expression (optionally enclosed in {...}, I
suppose).

Yes. Also compound literal could work like that. E.g
(const int) {1}

It's not always obvious whether an expression is constant or

Post by Keith Thompson
not; you have to examine everything it refers to.

I think the compilers will check if the expression is constant anyway.

Post by Keith Thompson
And if you intend b
to be a constant expression but accidentally write a non-constant
initializer, it's still a perfectly valid declaration of a read-only
object initialized with the result of evaluating a run-time expression;
the error won't be flagged until you try to use it.
Recall that `const int r = rand();` is still perfectly valid.

yes.

Post by Keith Thompson
constexpr int b = initializer;
you *know* that b can be used as a constant expression, and if the
initializer is not constant the compiler will flag it immediately.

My guess is that compilers checks for constant expressions always even
for non const. I just checked.

int main() {
int i = 2147483647 * 2147483647;
}
GCC - warning: integer overflow in expression of type 'int' results in
'1' [-Woverflow

CLANG - warning: overflow in expression; result is 1 with type 'int'
[-Winteger-overflow]

Post by Keith Thompson
This is already in C23. Dropping constexpr is politically impossible at
this point.

Yes. But we can avoid it in new code if the same functionally is
possible with const.

Post by Thiago Adams
C23 also added constexpr in compound literal.
(constexpr struct X ){ }
I also don´t understand why not just use const for that.

Because constexpr and const mean different things.

The difference for aggregates is that const can be used as constexpr if
all initializers are constant expressions. (I don´t know what are the
status of this in C2Y but this is just a logical consequence) While
constexpr will check if all initializers are constant.

Apart of that I don´t see difference. Both can have storage for instance.