Struct Error

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};

6.7.6.2p2: "The element type shall not be an incomplete or function type."

I have many draft versions of the C standard. n2912.pdf, dated
2022-06-08, says in 6.7.2.1.p3 about struct types that "... the type is
incomplete144) until immediately after the closing brace of the list
defining the content, and complete thereafter."

Therefore, struct scenet is not a complete type until the closing brace
of it's declaration.

However, that sentence disappeared in n3047.pdf, dated 2022-08-04. Can
anyone tell me why it was removed? With it gone, I'm not sure it is
still considered an incomplete type.

Ignoring for the moment the fact that it's not permitted, why do you
want to do that? In C code, people usually use pointers to the first
element of an array rather than pointers to arrays. However, it
sometimes is a good idea to have a pointer to an array, because that
makes the length of the array part of the pointer type, which can be
used to check the validity of the code that uses that pointer to access
the elements of the array. But in this case, the length of the array is
unspecified, so there's no such benefit.

m137

2025-01-23 03:49:07 UTC

Post by James Kuyper
I have many draft versions of the C standard. n2912.pdf, dated
2022-06-08, says in 6.7.2.1.p3 about struct types that "... the type is
incomplete144) until immediately after the closing brace of the list
defining the content, and complete thereafter."
Therefore, struct scenet is not a complete type until the closing brace
of it's declaration.
However, that sentence disappeared in n3047.pdf, dated 2022-08-04. Can
anyone tell me why it was removed? With it gone, I'm not sure it is
still considered an incomplete type.

It seems to have been moved to N3047 6.2.5(25): "[...] A structure or
union type of unknown content (as described in 6.7.2.3) is an incomplete
type. [...]" (see here:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf#page=57).

--

Tim Rentsch

2025-01-23 07:15:09 UTC

Post by m137

I have many draft versions of the C standard. n2912.pdf, dated
2022-06-08, says in 6.7.2.1.p3 about struct types that "... the type is
incomplete144) until immediately after the closing brace of the list
defining the content, and complete thereafter."
Therefore, struct scenet is not a complete type until the closing brace
of it's declaration.
However, that sentence disappeared in n3047.pdf, dated 2022-08-04. Can
anyone tell me why it was removed? With it gone, I'm not sure it is
still considered an incomplete type.

It seems to have been moved to N3047 6.2.5(25): "[...] A structure or
union type of unknown content (as described in 6.7.2.3) is an incomplete
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf#page=57).

That sentence isn't new in 6.2.5; it's been there since before C11.

James Kuyper

2025-01-23 08:37:24 UTC

Post by m137

It would be more accurate to say that it's been moved to 6.7.2.3,
concerning tags, which is then cross-referenced by 6.2.5:

"... the type (except enumerated types with a fixed underlying type) is
incomplete until immediately after the closing brace of the list
defining the content for the first time and complete thereafter."

It was a bit tricky to locate, because "unknown content" sounded like
the kind of term that the standard defines in one spot and uses
elsewhere, whereas in fact that term is used only once, and not defined
anywhere. Instead, it appears to be referring, by negation, to ther
terms "structure content", "union constent", and "enumeration content",
which are defined, but not until a few paragraphs later in 6.7.2.3.

Tim Rentsch

2025-01-23 07:31:31 UTC

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};

That clause appears (slightly modified IIANM) in 6.7.2.3 p5.

bart

2025-01-23 10:54:10 UTC

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};

Wouldn't this also be the case here:

struct scenet *child;
};

The struct is incomplete, but it still knows how to do pointer
arithmetic with that member. The calculation is not that different from
the array version (actually, the code from my compiler is identical).

Post by James Kuyper
However, that sentence disappeared in n3047.pdf, dated 2022-08-04. Can
anyone tell me why it was removed? With it gone, I'm not sure it is
still considered an incomplete type.
Ignoring for the moment the fact that it's not permitted, why do you
want to do that? In C code, people usually use pointers to the first
element of an array rather than pointers to arrays. However, it
sometimes is a good idea to have a pointer to an array, because that
makes the length of the array part of the pointer type, which can be
used to check the validity of the code that uses that pointer to access
the elements of the array. But in this case, the length of the array is
unspecified, so there's no such benefit.

I said the code is generated, which means the original language uses a
pointer-to-array at that spot.

That is safer, as you can't mistakenly index a pointer which happens to
be a reference to a single instance.

In any case, you should surely be able to choose a T(*)[] type over T*
if you want, but apparently not inside a self-referential struct.

BGB

2025-01-23 20:58:56 UTC

    struct vector;
    struct scenet;
    struct vector {
        double x;
        double y;
        double z;
    };
    struct scenet {
        struct vector center;
        double radius;
        struct scenet (*child)[];
    };

struct scenet *child;
};
The struct is incomplete, but it still knows how to do pointer
arithmetic with that member. The calculation is not that different from
the array version (actually, the code from my compiler is identical).

Difference is, in this case, "sizeof(struct scenet)" is not relevant to
"sizeof(struct scenet *)".

Formally, with the parenthesis and array, the size of the struct is
considered relevant (even if not strictly so), but is also unknown at
that point.

This seems like obscure edge case territory.

Alas, if I could have my way, I might define a simplified subset which
drops some of these sorts of edge cases (the form with parenthesis would
simply become disallowed), but, likely, this wouldn't amount to much.

Say, for a language that is:
Mostly backwards compatible with existing C code;
Allows for a smaller and simpler compilers;
Uses some C# like rules to eliminate the need for checking for typedefs
to parse stuff.

Though, one can't go entirely over to C# like behavior if one still
wants to support traditional separate compilation (so one would still
have a need for things like function prototypes, header files, and a
traditional preprocessor).

But, then one would basically just end up with C but with people being
confused about why things like "unsigned x;" no longer work (making it
kinda moot).

And, most people continue to swear by GCC and Clang, unconcerned with
their multi MLOC codebases, and the overly long time it takes to
recompile the compiler from source...

People favoring "one compiler to rule them all" over, say, smaller and
more specialized compilers.

But, alas, I can't seem to manage to fit a C compiler into the code
footprint of the original Doom engine, which is a bit weak.

Say, if one wanted a C compiler with under 30 kLOC, and a binary
(covering the entirely toolchain, *) in under around 500kB, ...

*: There can be a single binary that behaves as the entire toolchain,
and is symlinked to the various tool names. Which name it is called as
determining how it behaves and how it parses the command-line...

Granted, my existing compiler is a bit bigger; sadly, its code footprint
is more on par with Quake3, and its memory footprint generally a bit
steep (well, if one wants to run it on an FPGA board with 128MB of total
RAM; ideally one wants to keep the memory footprint needed to compile a
moderate size program in under around 50MB or so; which is an epic fail
for my compiler as it is...).

And, as-is, compiling stuff takes a painfully long time on a 50MHz CPU
(even a moderately small program might take several minutes or more).

...

I said the code is generated, which means the original language uses a
pointer-to-array at that spot.
That is safer, as you can't mistakenly index a pointer which happens to
be a reference to a single instance.
In any case, you should surely be able to choose a T(*)[] type over T*
if you want, but apparently not inside a self-referential struct.

bart

2025-01-24 00:51:04 UTC

struct scenet *child;
};
The struct is incomplete, but it still knows how to do pointer
arithmetic with that member. The calculation is not that different
from the array version (actually, the code from my compiler is
identical).

Difference is, in this case, "sizeof(struct scenet)" is not relevant to
"sizeof(struct scenet *)".

No, both of these need to know the size of the struct when accessing the
i'th element:

....
struct scenet *childp;
struct scenet (*childa)[];
};

The only thing you can't do with x->childa is perform pointer arithmetic
on the whole pointer-to-array, since the array size is zero. But doing
(x->childa)[i] should be fine.

As is clear since other compilers (excluding those that lavishly copy
gcc's behaviour) have no problem with it.

Post by BGB
Formally, with the parenthesis and array, the size of the struct is
considered relevant (even if not strictly so), but is also unknown at
that point.
This seems like obscure edge case territory.

It's a 'pointer to array'; it might be uncommon in C (because of its
fugly syntax), but it hs hardly obscure!

Post by BGB
Alas, if I could have my way, I might define a simplified subset which
drops some of these sorts of edge cases (the form with parenthesis would
simply become disallowed), but, likely, this wouldn't amount to much.

T(*)[] is a perfectly valid type; there is no reason to exclude it from
struct members.

It is unambiguous in my original language, and can also be in C.

Post by BGB
Mostly backwards compatible with existing C code;
Allows for a smaller and simpler compilers;
Uses some C# like rules to eliminate the need for checking for typedefs
to parse stuff.
Though, one can't go entirely over to C# like behavior if one still
wants to support traditional separate compilation (so one would still
have a need for things like function prototypes, header files, and a
traditional preprocessor).
But, then one would basically just end up with C but with people being
confused about why things like "unsigned x;" no longer work (making it
kinda moot).
And, most people continue to swear by GCC and Clang, unconcerned with
their multi MLOC codebases, and the overly long time it takes to
recompile the compiler from source...

Yeah. I can choose to run my compiler from source each time it is
invoked; you barely notice the difference! (It adds 70-80ms.)

This cuts no ice here however.

Post by BGB
Granted, my existing compiler is a bit bigger; sadly, its code footprint
is more on par with Quake3, and its memory footprint generally a bit
steep (well, if one wants to run it on an FPGA board with 128MB of total
RAM; ideally one wants to keep the memory footprint needed to compile a
moderate size program in under around 50MB or so; which is an epic fail
for my compiler as it is...).
And, as-is, compiling stuff takes a painfully long time on a 50MHz CPU
(even a moderately small program might take several minutes or more).

You can't cross-compile on a PC?

BGB

2025-01-24 06:27:10 UTC

struct scenet *child;
};
The struct is incomplete, but it still knows how to do pointer
arithmetic with that member. The calculation is not that different
from the array version (actually, the code from my compiler is
identical).

Difference is, in this case, "sizeof(struct scenet)" is not relevant
to "sizeof(struct scenet *)".

No, both of these need to know the size of the struct when accessing the
....
struct scenet *childp;
struct scenet (*childa)[];
};
The only thing you can't do with x->childa is perform pointer arithmetic
on the whole pointer-to-array, since the array size is zero. But doing
(x->childa)[i] should be fine.
As is clear since other compilers (excluding those that lavishly copy
gcc's behaviour) have no problem with it.

I think it is more a case of formal definitions here...

It's a 'pointer to array'; it might be uncommon in C (because of its
fugly syntax), but it hs hardly obscure!

In my own use, excluding function pointers, I almost never have a need
to use parenthesis with declarations.

Post by BGB
Alas, if I could have my way, I might define a simplified subset which
drops some of these sorts of edge cases (the form with parenthesis
would simply become disallowed), but, likely, this wouldn't amount to
much.

T(*)[] is a perfectly valid type; there is no reason to exclude it from
struct members.
It is unambiguous in my original language, and can also be in C.

I have a slight difference of opinion in that, if I were designing C, it
would not be allowed.

The merit of C is, in a way, that almost has just what is needed, little
more, and little less.

Unlike, say, C++, which went down the rabbit hole of ever-increasing
complexity. None the less, it has still gained some complexities beyond
the bare minimum, and still has some weak points. Such as lacking a
standardized form of vector/SIMD extensions, or any way to have
customizable types (though, the latter point risks getting dangerously
close to C++ territory, so dunno).

Post by BGB
Mostly backwards compatible with existing C code;
Allows for a smaller and simpler compilers;
Uses some C# like rules to eliminate the need for checking for
typedefs to parse stuff.
Though, one can't go entirely over to C# like behavior if one still
wants to support traditional separate compilation (so one would still
have a need for things like function prototypes, header files, and a
traditional preprocessor).
But, then one would basically just end up with C but with people being
confused about why things like "unsigned x;" no longer work (making it
kinda moot).
And, most people continue to swear by GCC and Clang, unconcerned with
their multi MLOC codebases, and the overly long time it takes to
recompile the compiler from source...

Yeah. I can choose to run my compiler from source each time it is
invoked; you barely notice the difference! (It adds 70-80ms.)
This cuts no ice here however.

Partial reason BGBCC still exists:
GCC and Clang are monstrosities (huge and slow to compile);
LCC offered very little over what I had already at the time;
TinyC didn't look like a particularly attractive starting point either.

However, as it is (having expanded significantly over the past some-odd
years), it can still be recompiled from source in a few seconds...

Whereas, rebuilding GCC is a good part of an hour, and LLVM+Clang
somehow manages to have build times measured in multiple hours (and, the
build times for Clang seem to get slower faster than computers are
getting faster).

Granted, one can speed it up some by trying to temporarily disable ones'
antivirus software, but that one is needed to start caring about things
like disabling AV software for faster build times, in the first place,
is still a problem...

Post by BGB
Granted, my existing compiler is a bit bigger; sadly, its code
footprint is more on par with Quake3, and its memory footprint
generally a bit steep (well, if one wants to run it on an FPGA board
with 128MB of total RAM; ideally one wants to keep the memory
footprint needed to compile a moderate size program in under around
50MB or so; which is an epic fail for my compiler as it is...).
And, as-is, compiling stuff takes a painfully long time on a 50MHz CPU
(even a moderately small program might take several minutes or more).

You can't cross-compile on a PC?

That it what I normally do, but it would be "nice" to have the option to
compile stuff natively from within the FPGA soft-processor or emulator.

But, to make this more practical would need a faster and lighter weight
compiler than what I have already.

Seemingly big issues:
Parsing an AST for a whole translation unit, eats a lot of RAM;
Decoding stuff into the internal 3AC IR, for a whole program at a time,
also eats a lot of RAM.

I had tried to look into designing a compiler with the preprocessor and
parser overlaid via a linked-list "line buffer" where, the preprocessor
would preprocess lines, put them in a linked list, and the parser would
consume them (freeing up each line once all tokens were consumed), and
then trying to drive the middle part of the compilation process one
top-level declaration at a time.

This turned into more of a mess than I would have hoped.

My existing compiler runs the preprocessor first, and generates a text
buffer containing the entire preprocessed output, but this can sometimes
reach sizes in MB territory (mostly with all of the stuff pulled in from
headers, which will often dwarf the actual code in each translation unit).

Then, the parser is left churning through large numbers of things like
structs, typedefs, and function prototypes, before getting to the actual
code. Parsing all these into an AST eats time and memory.

While the AST is arguably very bulky, one can at least entirely discard
it after each translation unit (this is one use case for a zone
allocator; where one can allocate AST related memory in an AST zone and
free all of it after each translation unit). The steep up-front cost of
the preprocessor output can also be reduced slightly by "chunking" the
buffering, say, into multiples of 32kB or similar (as opposed to trying
to "malloc()" the whole 1MB or so in a single large buffer).

Ideally, one then wants to leave the IL in a form where the compiler
doesn't need to load everything into 3AC form all at once, but my
existing IL design left little choice here. It was designed in a purely
linear structure with symbols managed by a sort of sliding array with an
LZ compression scheme, which means effectively the bytecode needs to be
decoded linearly and all at once.

Too many things that eat RAM.

Better would have been a structure where only a high-level view of the
metadata need to be decoded up-front (and then possibly in a way that
allows a cache-like approach), and similarly allowed for decoding the
Stack-IL into 3AC incrementally (say, when we are actually compiling the
function in question).

But, it is also a question of how to pull things off in a memory-compact
way without re-introducing a lot of the limitations that existed in
1980s era compilers (say, for example, the compiler having no way to
know whether or not a given function is reachable within the call graph).

Say, if you decode the entire program into 3AC form all at once, it is
possible to do things like walk the entire program as a graph and trace
out what functions are reachable (and determine things like local vs
external visibility, etc). This sort of a thing would be much less
viable if one could only look at a single function at a time.

But, then, if one needs to burn, say, 64 bytes per 3AC operation (and
one may have on average several hundred 3AC ops per function, and
several thousand functions in a program), RAM cost adds up quickly.

Where, in BGBCC, generally each function would have a dense array of 3AC
operator structs, and another array of "traces" which give the starting
and ending index of each basic block, and some flags and similar.

Things like 3AC nodes and string tables eating up lots of RAM.

But, the partial result of all of this is a compiler that has an
impractical memory footprint for an FPGA based soft processor (and is
also impractically slow).

Then again, my compiler is pretty slow even on my main PC. The amount of
time it takes being similar to that taken by GCC; which is kinda dead
slow if compared with MSVC. Seemingly, MSVC is somehow a very fast
compiler, with Clang sort of in-between (slower than MSVC, but still
faster than GCC).

Though, for actual compiled program performance, GCC tends to do pretty
well, and MSVC often worse. But, for some things, the reverse is true
(where the MSVC output is a lot faster than the GCC output).

...

But, as for ISA support on my processor (and supported by BGBCC), there
are currently several options:
BJX2 Baseline
Original form of my custom ISA;
Primarily, it is a 32-register design, with 16/32/64/96 bit ops;
XG2:
Newer variant of my ISA;
Drops 16-bit ops, moves over to 6-bit register fields;
Natively uses 64 GPRs;
Has 32/64/96 bit encodings.
RISC-V (RV64G)
Uses 5 bit register fields, with 32 GPRs;
And, another 32 FPU registers.
The CPU supports the 16-bit "C" extension, but BGBCC does not.
With my design, the "C" ops come with a performance penalty.
I have a jumbo-prefix extension that adds 64 and 96 bit encodings.
Largely to improve performance.
It works in essentially the same way as in my own ISA,
and does similar things.
Among a few other custom extensions.
XG3:
Bit-repacked an modified version of my ISA;
Can be "crazy glued" onto RV64G to make a sort of hybrid ISA.
It implicitly "re-merges" the X and F registers,
which were split in RV64G.
But, more just that it goes back to what XG2 did...

Currently, performance:
Plain RV64G is slower than both XG2 and XG3,
including when compiled with "gcc -O3"
Though, GCC is faster than BGBCC when targeting bare RV64G.
BGBCC targeting plain RV64G: Kinda sucks...
If I trick out the ISA, BGBCC is faster than GCC targeting RV64G.
Dunno what would happen if GCC could use my ISA extensions...
XG2 currently holds the speed prize...
XG3 isn't quite as fast at XG2 at present, but has promise.

In theory, XG2 and XG3 should be basically equivalent, as they are (more
or less) the same ISA just with the bits shuffled around (mostly this
was to allow XG3 to coexist in the same opcode space as RV64G, replacing
the "C" extension's encoding space). In the process, I did slightly
improve the "aesthetics" of the encoding scheme.

There are some minor differences between them, mostly related to how
BGBCC is using the ISA, and the ABI (with XG3, it is using the RISC-V
ABI rules).

One thing that does minorly hurt BGBCC here is that it primarily uses
callee-save registers for local variables, where:
My native ABI is balanced in favor of slightly more callee save
registers than scratch registers;
The RISC-V ABI has more scratch registers than callee save registers;
So, when using the RISC-V ABI, there is more register pressure (and more
register spills).

Ironically, at the same time, the RISC-V ABI has less function argument
registers vs XG2 (8 vs 16), and lacks argument spill space, which in
turn contribute towards making things "slightly less efficient".

Can't really "fix" the ABI for XG3 without causing binary compatibility
issues with calls to/from RV64G code, which would defeat the whole point
of why XG3 exists.

Having more scratch registers (and fewer callee save) is better for leaf
functions, but implicitly assumes that one is spending more of their
time in leaf functions (and comparably hurts performance if the program
is dominated more by going up and down the call stack in non-leaf
functions).

Though, arguably, one could make more use of scratch-registers within
non-leaf functions if one could be strategic about where the function
calls occur and when/where register spills are needed (but, my compiler
is not that clever; and mostly treats the scratch-registers as
off-limits for local variables within non-leaf functions).

Then again, debatable it could all be for nothing:
My fastest case is only around 40% faster than the "gcc -O3" output (for
programs like Doom and similar);
And, maybe, 40% isn't really enough to be worth the issues of a
non-standard ISA variant.

But, granted, it is closer to around 500% for OpenGL (trying to build my
OpenGL implementation with RV64 and GCC performs horribly). But, I kinda
needed to SIMD the crap out of this (plain RV64G lacks any form of SIMD
support).

...

David Brown

2025-01-24 08:45:19 UTC

Post by bart
No, both of these need to know the size of the struct when accessing the
....
struct scenet *childp;
struct scenet (*childa)[];
};
The only thing you can't do with x->childa is perform pointer arithmetic
on the whole pointer-to-array, since the array size is zero. But doing
(x->childa)[i] should be fine.
As is clear since other compilers (excluding those that lavishly copy
gcc's behaviour) have no problem with it.

This is one of these cases where the C language /could/ have been
defined to allow incomplete types to be used. But the language
definition (the standards) does not allow it.

Some C compilers do a poor job of checking C code. Some are lax and
allow things that the language does not, accepting code that breaks
syntax and grammar rules or constraints, and happily generating code
instead of issuing the standards-required diagnostics. (gcc in default
modes is such a compiler - you need to add appropriate flags to make it
do a decent job of enforcing standard checking.)

This raises a few questions :

1. Should future C standards be modified to be more lenient in the code
the accept? Was there a good reason for these limitations, and is that
reason still valid?

2. Should compilers behave this way, and accept code that can have an
"obvious" interpretation despite going against the C standards?

3. Should people write (or generate) C code that relies on the
non-standard behaviour of some C compilers?

Certainly some restrictions in the way C was defined were heavily
influenced by the compiler technology of the day. But others - and I
suspect this case is one of them - exist because they simplify the
rules. Writing rules that are detailed enough to allow everything that
could, theoretically, be supported while also excluding everything else
can get very complicated. A language standard cannot say "if the
compiler is smart enough to figure this out, that's okay". A given
compiler implementation can do that if it wants, but the standards cannot.

At first glance it sounds like a good idea for compilers to be flexible
in what they accept. But the risk for that is fragmentation of the
language. The whole point of standardising a language is to avoid that
- so that, to a reasonable level at least, it is possible to write code
that can be compiled with a large range of C compilers. Before C was
standardised, the language had already fragmented into lots of variants.
It is /not/ a good thing that compilers accept code that requires a
diagnostic, or have lots of extensions enabled by default. You should
have to explicitly enable such things with flags - instead of explicitly
requesting that the compiler conforms to the standards.

For some kinds of code, it is entirely appropriate to rely on extensions
or non-standard features of particular compilers. But this should be
done as an exception, when there is good reason for it.

In this particular case, there is not the slightest reason for
generating something non-standard. The correct response here is to get
mildly irritated that C does not support the syntax you hoped to
generate, feel smug that your own language /does/ support such a syntax,
then change the code generator to produce correct standard C code.

And gcc is entirely correct here. Other compilers that reject the
incorrect code are not "slavishly copying gcc's behaviour" - they are
correctly implementing the C standards. It is compilers that accept
this code, without requiring flags or other explicit "enable extension"
features, that are doing a poor job of being C compilers.

(gcc often does the wrong thing by default accepting bad code without
question, but in this case it is correct.)

Kaz Kylheku

2025-01-24 20:31:42 UTC

Post by David Brown

This is one of these cases where the C language /could/ have been
defined to allow incomplete types to be used. But the language
definition (the standards) does not allow it.

It does; the implementation can issue a required diagnostic,
and keep chugging along. The behavior becomes undefined, but
the same implementation can provide its own definition:
like such that when the type is completed by the time it
matters, it's all good.

The language definition only does not allow the implementation to be
called conforming if it doesn't diagnose the usage, and doesn't allow
the program's behavior to be well-defined ISO C.

Post by David Brown
1. Should future C standards be modified to be more lenient in the code
the accept? Was there a good reason for these limitations, and is that
reason still valid?

In this particular matter, GNU C++ accepts the code. If that happens to
be because of how ISO C++ is defined, then that carries substantial
weight. Why should C require a diagnostic in something that C++
allows to pass. (C++, whose C-like subset is touted as a "safer C"!)

Speaking of which, Bart never responded to the workaround I found,
namely that g++ accepts his code.

I'm guessing it's too abhorrent to even think about, like inviting the
occasional blasphemer into a satanic cult.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

bart

2025-01-24 22:53:45 UTC

Post by Kaz Kylheku

Post by David Brown

This is one of these cases where the C language /could/ have been
defined to allow incomplete types to be used. But the language
definition (the standards) does not allow it.

It does; the implementation can issue a required diagnostic,
and keep chugging along. The behavior becomes undefined, but
like such that when the type is completed by the time it
matters, it's all good.
The language definition only does not allow the implementation to be
called conforming if it doesn't diagnose the usage, and doesn't allow
the program's behavior to be well-defined ISO C.

Post by David Brown
1. Should future C standards be modified to be more lenient in the code
the accept? Was there a good reason for these limitations, and is that
reason still valid?

In this particular matter, GNU C++ accepts the code. If that happens to
be because of how ISO C++ is defined, then that carries substantial
weight. Why should C require a diagnostic in something that C++
allows to pass. (C++, whose C-like subset is touted as a "safer C"!)
Speaking of which, Bart never responded to the workaround I found,
namely that g++ accepts his code.

While gcc gives that one error in the complete program, g++ gives about
250 errors.

I just excluded that program from the set of benchmarks I was testing.

The C transpiler used is a deprecated product and I'm not about to start
messing with it now. I thought there might have been a simple tweak I
could have made manually.

James Kuyper

2025-01-25 01:53:41 UTC

...

Post by Kaz Kylheku

Post by David Brown
This is one of these cases where the C language /could/ have been
defined to allow incomplete types to be used. But the language
definition (the standards) does not allow it.

In that sense, the only thing that the C standard does disallow is
translation of a correctly formatted #error directive that survives
conditional compilation (an incorrectly formatted directive would give
the implementation permission to translate it).

In more conventional usage, the C standard is said to allow something
only when its behavior is no worse than "unspecified". If the behavior
is undefined, the standard imposes no requirements, in which case the
implementation is permitted to give your translated code any arbitrary
behavior it is able to give it.

That only makes sense if you don't care what your program does - and if
that is the case, there's no reason to bother writing a new program -
just execute an arbitrary existing program. Whatever the behavior of
that arbitrary existing program is, it is behavior that would be
permitted for the translation of your program.

James Kuyper

2025-01-24 13:43:51 UTC

struct scenet *child;
};
The struct is incomplete, but it still knows how to do pointer
arithmetic with that member. The calculation is not that different
from the array version (actually, the code from my compiler is
identical).

Difference is, in this case, "sizeof(struct scenet)" is not relevant
to "sizeof(struct scenet *)".

The problem is not what you can do with the pointer, but the alignment
and representation of the pointer itself. Those are uniquely determined
for childp by the fact that it's a pointer to a struct type, regardless
of the content of that type (6.2.5p33). The alignment and representation
of childa, on the other hand, could depend upon the content of struct
scenet (in particular, the size of that struct type). It needs to know
those requirements, in order to complete the definition of struct
scenet. That's not an insoluble problem: just choose an alignment and
representation for childa that allows a pointer to the entire struct to
have that same alignment and representation - but the standard chose not
to mandate that it be solved.
If you've never tried to create a compiler for a platform where it makes
sense to have pointers to object types with different alignment and
representations, depending upon the type that they point at, you
probably have no idea what would be involved in solving that problem (I
certainly don't).

bart

2025-01-24 23:32:53 UTC

struct scenet *child;
};
The struct is incomplete, but it still knows how to do pointer
arithmetic with that member. The calculation is not that different
from the array version (actually, the code from my compiler is
identical).

Difference is, in this case, "sizeof(struct scenet)" is not relevant
to "sizeof(struct scenet *)".

Since the vast majority of machines have no such problems, why should
this affect the users of those machines?

It seems perfectly reasonable to me to have a pointer to an unbounded
array (it's little different to void*), and to be able to use such a
pointer within a struct.

C already allows self-referential structs, so that shouldn't be a
problem either. The calculation required for indexing such an array
depends on the size of the struct, and this will be known by that point.

The context here is that of a language where that feature is
well-defined, using C as an intermediate represention, and compiling for
a machine where the required pointers are all well-defined (every
pointer for code or data) is a 64-bit value.

So gcc's behaviour here is not helpful and not useful; what problem is
it trying to avoid that would cause chaos on my x64 platform?

The alighnment of the struct should also not affect the code generated
for the array access, and itself should not be affected by that choice
of type.

Lawrence D'Oliveiro

2025-01-23 23:50:28 UTC

Post by bart
struct scenet *child;
};
The struct is incomplete, but it still knows how to do pointer
arithmetic with that member.

No, because there is no pointer arithmetic involved in processing that
declaration.

Of course you will get a suitable error in a subsequent expression that
does involve such pointer arithmetic, if the struct has not been fully
defined by that point.

bart

2025-01-24 00:37:27 UTC

Post by Lawrence D'Oliveiro

Post by bart
struct scenet *child;
};
The struct is incomplete, but it still knows how to do pointer
arithmetic with that member.

No, because there is no pointer arithmetic involved in processing that
declaration.

Neither of these member declarations involve any expression:

....
struct scenet *childp;
struct scenet (*childa)[];
};

Yet childp is fine, but childa is a gcc compiler error. Why is that?

When expressions are used later on, both need to know the size of
struct, which has been determined by then.

Lawrence D'Oliveiro

2025-01-24 00:57:06 UTC

Post by bart
Yet childp is fine, but childa is a gcc compiler error. Why is that?

Because it needs to know the size of the type to work out the function
type’s calling convention.

Keith Thompson

2025-01-24 01:23:46 UTC

Post by Lawrence D'Oliveiro

[snipped context restored]

Post by bart
struct scenet *childp;
struct scenet (*childa)[];
};
Yet childp is fine, but childa is a gcc compiler error. Why is that?

Because it needs to know the size of the type to work out the function
type’s calling convention.

I restored the context you snipped. There is no function type in the
code Bart was asking about. childa is defined as a pointer to an
(incomplete) array of struct scenet.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

bart

2025-01-24 01:27:07 UTC

Post by Lawrence D'Oliveiro

Post by bart
Yet childp is fine, but childa is a gcc compiler error. Why is that?

Because it needs to know the size of the type to work out the function
type’s calling convention.

OK. So you've no idea what you're talking about. That's fine.

James Kuyper

2025-01-24 13:24:18 UTC

The difference is that there's an explicit requirement that the element
type of an array type be complete. As far as I know, there's no such
requirement that applies when you have a pointer to incomplete type,
rather than an array of an incomplete type. If you think otherwise,
please identify the requirement. I started reviewing all the places
where the standard says something about complete and incomplete types,
but there's way too many of them.

The reason, I think, is the following:
"A pointer to void shall have the same representation and alignment
requirements as a pointer to a character type.53) Similarly, pointers to
qualified or unqualified versions of compatible types shall have the
same representation and alignment requirements. All pointers to
structure types shall have the same representation and alignment
requirements as each other. All pointers to union types shall have the
same representation and alignment requirements as each other. Pointers
to other types need not have the same representation or alignment
requirements." (6.2.5p33)

This means that, in principle, the representation and alignment of a
pointer to an array of an incomplete struct type might depend upon the
unknown content of that struct type, if only through the size of the
type. So a pointer to an array of an incomplete type presents a possible
challenge that isn't a problem for a pointer to an object of that
incomplete type.
I'm sure that you're used to platforms where all pointers to object
types have the same representation and alignment requirements - most
developers are. However, there are very real platforms where that isn't
the case, and the C standard goes out of its way to permit conforming
implementations of C on such platforms.

Michael S

2025-01-24 14:37:40 UTC

On Thu, 23 Jan 2025 10:54:10 +0000

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};

struct scenet *child;
};

Just to point out if it was not said already: the problem is not related
specifically to recursive structures. It applies to arrays of
incomplete types in all circumstances.

struct bar;
struct bar (*bag)[]; // error
typedef struct bar (*bat)[]; // error

The case of the recursive structure is special only in a sense that it's
o.k. in C++, because [unlike C] in C++ struct considered complete within
its own body.

bart

2025-01-26 19:14:00 UTC

Post by Michael S
On Thu, 23 Jan 2025 10:54:10 +0000

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};

struct scenet *child;
};

I don't think anyone has yet explained why that is an error (other than
C says it is), but not this:

struct bar *ptr;

This is a pointer to an incomplete type. Attempts to do ++ptr for
example will fail later on if that struct has not yet been defined.

So why not the same for the pointer-to-array versions?

It just doesn't make sense.

Is it just because such pointers HAVE to work, otherwise
self-referential structs become impossible? That would make it a hack,
in which case why not apply it to arrays too?

Post by Michael S
The case of the recursive structure is special only in a sense that it's
o.k. in C++, because [unlike C] in C++ struct considered complete within
its own body.

For non-recursive, you can choose to declare the pointer-to-array after
the struct has been fully defined.

Michael S

2025-01-26 21:14:35 UTC

On Sun, 26 Jan 2025 19:14:00 +0000

Post by bart
Is it just because such pointers HAVE to work, otherwise
self-referential structs become impossible?

More important use case of poiner to incomplete struct is for abstract
types with implementation completely hidden from the user.

Kaz Kylheku

2025-01-27 04:05:10 UTC

Post by bart
This is a pointer to an incomplete type. Attempts to do ++ptr for
example will fail later on if that struct has not yet been defined.
So why not the same for the pointer-to-array versions?
It just doesn't make sense.

You already know that GNU C++ silently accepts it, so this is
beating a dead horse.

Sure, something in a type not being specified is not a problem until the
information is actually needed for something. We can think about
a lazy type evaluation system. Functional programming languages
tend to have them.

But note that the rule /is/ actually consistent among aggregates.
Both an array and struct are aggregates. The elements are to
an array roughly the same thing that members are to a struct.
A struct may not have members of incomplete type,
An array may not have elements of incomplete type.

Your situation is this:

struct incomplete {
struct incomplete (*parray)[];
};

If we make a pointer to a struct rather than array,
it's the same kind of problem:

struct incomplete {
struct nested_incomplete {
struct incomplete memb;
} *pstruct;
};

In both cases, we have a pointer to something which
has an element, or member, of the incomplete type of
the outer struct which is to contain the pointer.

If the array version should work, so should the
struct version.

Post by Michael S
The case of the recursive structure is special only in a sense that it's
o.k. in C++, because [unlike C] in C++ struct considered complete within
its own body.

For non-recursive, you can choose to declare the pointer-to-array after
the struct has been fully defined.

If a C++ struct is complete within its own body, that means this should
be possible:

struct foo {
struct foo x;
int y;
};

That cannot be the reason why the pointer to array works in GNU C++.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

bart

2025-01-27 20:19:13 UTC

Post by Kaz Kylheku

You already know that GNU C++ silently accepts it, so this is
beating a dead horse.

C++ is no good to me. My old transpiler (now a deprecated product
anyway) generated C. Fixing it either means updating the transpiler (and
finding some hacky workaround like using casts everywhere) or doing that
manually to the generated C. Neither appeal.

Post by Kaz Kylheku
struct incomplete {
struct incomplete (*parray)[];
};
If we make a pointer to a struct rather than array,
struct incomplete {
struct nested_incomplete {
struct incomplete memb;
} *pstruct;
};
In both cases, we have a pointer to something which
has an element, or member, of the incomplete type of
the outer struct which is to contain the pointer.

It's trickier problem: I'm not sure myself what the size should be,
whereas that was easy to see with my array example. Here even TCC
reports it as incomplete. How my C compiler tells me the size is 8
bytes, which sounds reasonable given that the only concrete member in
there is one 64-bit pointer.

Post by Kaz Kylheku
If the array version should work, so should the
struct version.

Post by Michael S
The case of the recursive structure is special only in a sense that it's
o.k. in C++, because [unlike C] in C++ struct considered complete within
its own body.

For non-recursive, you can choose to declare the pointer-to-array after
the struct has been fully defined.

If a C++ struct is complete within its own body, that means this should
struct foo {
struct foo x;
int y;
};

This one seems impossible, and even C++ fails it. Because you're
directly including an actual struct within itself.

Still, my compiler is not bothered by it! It gives an overall size of 16
bytes and an offset for both x and y of 0. That embedded (incomplete)
version of struct foo uses the wrong size.

A similar example in my language gives a recursion failure.

However, examples like the ones in OP are well-defined: the member
involved is a single pointer of a fixed size.

Tim Rentsch

2025-01-29 10:59:50 UTC

Post by Michael S
On Thu, 23 Jan 2025 10:54:10 +0000

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};

struct scenet *child;
};

I don't think anyone has yet explained why that is an error (other
struct bar *ptr;
This is a pointer to an incomplete type. Attempts to do ++ptr
for example will fail later on if that struct has not yet been
defined.
So why not the same for the pointer-to-array versions?

The question you should be asking is why did the original C
standards body make the rule they did?

The answer might be because this exception to a simple and
general rule is almost never useful, and never necessary.

Considering that it has been 35 years since that original rule
was made, and 2025 is the first time the question has come up,
the indications are that the original decision was a good one.

bart

2025-01-29 11:36:21 UTC

Post by Michael S
On Thu, 23 Jan 2025 10:54:10 +0000

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};

struct scenet *child;
};

Well, you never see such a thing in use, certainly. I wonder why that is!

When a language outlaws some particular construction, forcing people to
stick to a particular idiom (the common use of a T* type to work with
pointers and arrays instead of the more sensible and safer T(*)[]), then
clearly you're not going to see such uses in the field.

Although there are really two parts to it: use of T(*)[] generally
(outside of self-referential structs) is allowed, but that is still
rare, presumably because the syntax is too unwieldy. Or people simply
don't know about it, since everyone uses T*.

My use-case was within generated code, so that aspect was not relevant.

Post by Tim Rentsch
Considering that it has been 35 years since that original rule
was made, and 2025 is the first time the question has come up,
the indications are that the original decision was a good one.

We don't know that. Perhaps it comes up all the time, people realise
they can't use such a construct, and use a different approach.

Tim Rentsch

2025-01-30 19:51:56 UTC

Post by Michael S
On Thu, 23 Jan 2025 10:54:10 +0000

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};

struct scenet *child;
};

Well, you never see such a thing in use, certainly. I wonder why
that is!
When a language outlaws some particular construction, forcing
people to stick to a particular idiom (the common use of a T* type
to work with pointers and arrays instead of the more sensible and
safer T(*)[]), then clearly you're not going to see such uses in
the field.
Although there are really two parts to it: use of T(*)[]
generally (outside of self-referential structs) is allowed, but
that is still rare, presumably because the syntax is too unwieldy.
Or people simply don't know about it, since everyone uses T*.

I didn't say this use case isn't used. I said this use case is
almost never useful.

We don't know that. Perhaps it comes up all the time, people
realise they can't use such a construct, and use a different
approach.

We don't know that there has never been a person 50 feet tall
either, but that doesn't mean there has been one.

Richard Damon

2025-01-29 12:32:26 UTC

Post by Michael S
On Thu, 23 Jan 2025 10:54:10 +0000

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};

struct scenet *child;
};

The question you should be asking is why did the original C
standards body make the rule they did?

My guess is that it makes the simplest implementation of a C compiler
much more complicated. While I don't think it has been explicited
stated, one goal the original language, and apparently kept by the
Standards Comittee, has been to make the language fairly simple to
proceess to get working code. To optimize to make fast, might take more
work, but to make your first complier for a system should be straight
forward. I believe a C compiler can still be done with a single pass
through the source code, with limited look ahead, and only the final
"link" step needs to be able to handle large chunks of the program.

Allowing the pointer to array time to be based on an incomplete type
might make this goal harder.

Post by Tim Rentsch
The answer might be because this exception to a simple and
general rule is almost never useful, and never necessary.
Considering that it has been 35 years since that original rule
was made, and 2025 is the first time the question has come up,
the indications are that the original decision was a good one.

Tim Rentsch

2025-01-29 15:52:33 UTC

Post by Richard Damon

Post by Michael S
On Thu, 23 Jan 2025 10:54:10 +0000

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};

struct scenet *child;
};

The question you should be asking is why did the original C
standards body make the rule they did?

My guess is that it makes the simplest implementation of a C compiler
much more complicated. While I don't think it has been explicited
stated, one goal the original language, and apparently kept by the
Standards Comittee, has been to make the language fairly simple to
proceess to get working code. To optimize to make fast, might take
more work, but to make your first complier for a system should be
straight forward. I believe a C compiler can still be done with a
single pass through the source code, with limited look ahead, and only
the final "link" step needs to be able to handle large chunks of the
program.
Allowing the pointer to array time to be based on an incomplete type
might make this goal harder.

Possibly. I suspect the question was never considered, simply
because it never came up. It's unusual even to have a pointer to
an array with unknown extent, and an array with an incomplete
element type is an even weirder beast. It's easy to believe that
the peculiar circumstances of the situation being asked about
just never occurred to anyone. Given that, the simple rule in
the C standard has an obvious and natural appeal.

Tim Rentsch

2025-01-23 07:11:54 UTC

Post by bart
struct vector;
struct scenet;
struct vector {
double x;
double y;
double z;
};
struct scenet {
struct vector center;
double radius;
struct scenet (*child)[];
};
error: array type has incomplete element type 'struct scenet'
struct scenet (*child)[];
^~~~~
Is there any way to fix this, or is it not possible?
(This comes from generated code. Idiomatic C would use a T* here
rather than T(*)[], but that is not an option. Other compilers like
tcc, DMC and mine have no problem with it.)

The code shown violates a constraint in the C standard, because
the element type of the array declarator is an incomplete type
at the point the 'child' member is declared, so a diagnostic
is required.

Andrey Tarasevich

2025-02-03 04:35:59 UTC