Results of survey re. a new array size operator

On Fri, 24 Jan 2025 17:57:22 +1100

Post by Alexis
Hi all,
JeanHeyd Meneide, a Project Editor for WG14, has just posted the
results of a survey re. the preferred form of a new array size
"There is a clear preference for a lowercase keyword, here, though it
is not by the biggest margin. One would imagine that with the way we
keep standardizing things since C89 (starting with _Keyword and then
adding a header with a macro) that C folks would be overwhelmingly in
favor of simply continuing that style. The graph here, however, tells
a different story: while there’s a large contingency that clearly
hates having _Keyword by itself, it’s not the _Keyword + stdkeyword.h
macro that comes out on top! It’s just having a plain lowercase
keyword, instead."
-- https://thephd.dev/the-big-array-size-survey-for-c-results
Alexis.

Majority is wrong. What's new?
In the absence of BDFL we have to live with it.

Scott Lurndal

2025-01-24 14:16:16 UTC

Post by Michael S
On Fri, 24 Jan 2025 17:57:22 +1100

Post by Alexis
Hi all,
=20
JeanHeyd Meneide, a Project Editor for WG14, has just posted the
results of a survey re. the preferred form of a new array size
=20
"There is a clear preference for a lowercase keyword, here, though it
is not by the biggest margin. One would imagine that with the way we
keep standardizing things since C89 (starting with _Keyword and then
adding a header with a macro) that C folks would be overwhelmingly in
favor of simply continuing that style. The graph here, however, tells
a different story: while there=E2=80=99s a large contingency that clearly
hates having _Keyword by itself, it=E2=80=99s not the _Keyword + stdkeywo=

rd.h

Post by Alexis
macro that comes out on top! It=E2=80=99s just having a plain lowercase
keyword, instead."
=20
-- https://thephd.dev/the-big-array-size-survey-for-c-results
=20
=20
Alexis.

Majority is wrong. What's new?=20

Actually, the entire article is bogus. There's no need for
some new keyword to replace the code that's been used for
half a century to size a statically allocated array.

Using the phrase 'debate perverts' in an attempt to deflect
criticism pretty much eliminates authorial credibility.

int arfarf[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
return sizeof(arfarf) / sizeof(arfarf[0]);

Kaz Kylheku

2025-01-24 20:10:10 UTC

Post by Michael S
On Fri, 24 Jan 2025 17:57:22 +1100

rd.h

Post by Alexis
macro that comes out on top! It=E2=80=99s just having a plain lowercase
keyword, instead."
=20
-- https://thephd.dev/the-big-array-size-survey-for-c-results
=20
=20
Alexis.

Majority is wrong. What's new?=20

Actually, the entire article is bogus. There's no need for
some new keyword to replace the code that's been used for
half a century to size a statically allocated array.
Using the phrase 'debate perverts' in an attempt to deflect
criticism pretty much eliminates authorial credibility.
int arfarf[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
return sizeof(arfarf) / sizeof(arfarf[0]);

You can define

#define arraysize (x) (sizeof (x) / sizeof ((x)[0))

the only problem is that this bug sometimes happens:

arraysize (ptr)

however, this can be diagnosed. A compiler can look
for occurrences of a match for the abstract syntax
tree pattern:

(sizeof (x) / sizeof (*(x) + 0)

where x is other than an array type, and issue a warning.

$ cat > arraysize.c
#include <stdio.h>

int main(int argc, char **argv)
{
printf("array size of argv, wrong = %zd\n",
sizeof (argv) / sizeof (argv[0]));
return 0;
}
$ make CFLAGS="-Wall -W -O2" arraysize
cc -Wall -W -O2 arraysize.c -o arraysize
arraysize.c: In function 'main':
arraysize.c:6:26: warning: division 'sizeof (char **) / sizeof (char *)' does not compute the number of array elements [-Wsizeof-pointer-div]
6 | sizeof (argv) / sizeof (argv[0]));
| ^
arraysize.c:3:27: note: first 'sizeof' operand was declared here
3 | int main(int argc, char **argv)
| ~~~~~~~^~~~
arraysize.c:3:14: warning: unused parameter 'argc' [-Wunused-parameter]
3 | int main(int argc, char **argv)
| ~~~~^~~~

Thus, this is a solved problem, except for everyone reinventing
their own name for the array size macro.

It was enough of a problem for everyone reinventing their own name
for a 32 bit unsigned integer that we got uint32_t.

I'd be in favor of <sttddef.h> incorporating an arraysize
macro similar to how it has offsetof.

This macro definition would be hidden unless the compiler operator
selects a the dialect of C where that has become available,
or newer.

The standard can have a footnote encouraging implementors to
diagnose sizeof (ptr) / sizeof (ptr[0]).

It could be a constraint violation in the division operator, e.g.

Constraints [ proposed ]
...
When the left operand of / is an expression of the form sizeof expr, or
sizeof (type), where the size obtained of a type pointer to T,
the right operand shall not be an expression of the form sizeof expr
or sizeof (type), where the size obtained is of a type T.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Scott Lurndal

2025-01-24 22:12:22 UTC

Post by Michael S
On Fri, 24 Jan 2025 17:57:22 +1100

rd.h

Post by Alexis
macro that comes out on top! It=E2=80=99s just having a plain lowercase
keyword, instead."
=20
-- https://thephd.dev/the-big-array-size-survey-for-c-results
=20
=20
Alexis.

Majority is wrong. What's new?=20

Actually, the entire article is bogus. There's no need for
some new keyword to replace the code that's been used for
half a century to size a statically allocated array.
Using the phrase 'debate perverts' in an attempt to deflect
criticism pretty much eliminates authorial credibility.
int arfarf[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
return sizeof(arfarf) / sizeof(arfarf[0]);

You can define
#define arraysize (x) (sizeof (x) / sizeof ((x)[0))

You can, but you don't need to. Often readability suffers
when you use macros, not to mention the other quirks of
C macro use (in C++, a constexpr function might be
suitable, but the naming being arbitrary (e.g. arraysize,
NUM_ELEMENTS, SIZE, et alia) doesn't aid in readability
or maintainability.

My pattern, since 1979, is:

static const char *stop_types[] = {
"Running", "Console Stop", "Halt Branch",
"Halt Breakpoint", "IPC Instruction", "Memory Breakpoint",
"Instruction Breakpoint", "Instruction Step", "RED LIGHT"
};
static const size_t num_stop_types = sizeof(stop_types)/sizeof(stop_types[0]);

Kaz Kylheku

2025-01-25 00:57:25 UTC

Post by Kaz Kylheku
You can define
#define arraysize (x) (sizeof (x) / sizeof ((x)[0))

You can, but you don't need to.

The repetition in things like:

sizeof foo->bar.buf / *sizeof foo->bar.buf

is just irksome. Why do I have to say that thing twice,
to get its number of elements?

Post by Scott Lurndal
Often readability suffers
when you use macros, not to mention the other quirks of
C macro use (in C++, a constexpr function might be
suitable, but the naming being arbitrary (e.g. arraysize,
NUM_ELEMENTS, SIZE, et alia) doesn't aid in readability
or maintainability.

The naming being arbitrary is the argument for standardizing the name
for the macro and sticking it into, for instance, <stddef.h>.

If we didn't have offsetof there, we might have to deal with
OFFSETOF, offsetof, offset, member_offset, and others.

The containerof name for a popular macro seems to be a de facto
standard, luckily.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Tim Rentsch

2025-01-29 09:48:46 UTC

Post by Kaz Kylheku
You can define
#define arraysize (x) (sizeof (x) / sizeof ((x)[0))

You can, but you don't need to.

sizeof foo->bar.buf / *sizeof foo->bar.buf
is just irksome. Why do I have to say that thing twice,
to get its number of elements?

The naming being arbitrary is the argument for standardizing the
name for the macro and sticking it into, for instance, <stddef.h>.
If we didn't have offsetof there, we might have to deal with
OFFSETOF, offsetof, offset, member_offset, and others.

That's a flawed analogy. A macro to compute the number of
elements in an array can be done in standard C. The
functionality of offsetof cannot be done in standard C, and
that's what it needs to be in the standard library.

bart

2025-01-29 11:45:47 UTC

Post by Kaz Kylheku
You can define
#define arraysize (x) (sizeof (x) / sizeof ((x)[0))

You can, but you don't need to.

sizeof foo->bar.buf / *sizeof foo->bar.buf
is just irksome. Why do I have to say that thing twice,
to get its number of elements?

The naming being arbitrary is the argument for standardizing the
name for the macro and sticking it into, for instance, <stddef.h>.
If we didn't have offsetof there, we might have to deal with
OFFSETOF, offsetof, offset, member_offset, and others.

Can't it? The various versions I've seen, including mine, look like this:

#define offsetof(a,b) (size_t) &( ((a*)0) -> b)

As for the other point that was made, when looking at open source code,
every other program seems to contain macros like

MAX
ARRAYLEN
STREQ

But with assorted spellings (the first program I looked at today used
MZ_MAX).

However, every other program also seems to use typedefs to define their
own versions of INT32 etc, even with stdint.h type being available for
25 years.

So being in the standard is not enough if the official name is too ugly
or it is fiddly to type.

Michael S

2025-01-29 12:24:30 UTC

On Wed, 29 Jan 2025 11:45:47 +0000

Post by bart

Post by Kaz Kylheku
You can define
#define arraysize (x) (sizeof (x) / sizeof ((x)[0))

You can, but you don't need to.

sizeof foo->bar.buf / *sizeof foo->bar.buf
is just irksome. Why do I have to say that thing twice,
to get its number of elements?

The naming being arbitrary is the argument for standardizing the
name for the macro and sticking it into, for instance, <stddef.h>.
If we didn't have offsetof there, we might have to deal with
OFFSETOF, offsetof, offset, member_offset, and others.

#define offsetof(a,b) (size_t) &( ((a*)0) -> b)

A magic of Standard Library, which is above UB rules of mortals.
The macro like above is blessed as long as it resides in stddef.h, but
damned in nonstddef.h.

OTOH, '#define ARRAY_LEN (sizeof((x))/sizeof((x)[0]))' is o.k.
everywhere.

Richard Damon

2025-01-29 12:24:46 UTC

Post by Kaz Kylheku
You can define
#define arraysize (x) (sizeof (x) / sizeof ((x)[0))

You can, but you don't need to.

sizeof foo->bar.buf / *sizeof foo->bar.buf
is just irksome. Why do I have to say that thing twice,
to get its number of elements?

The naming being arbitrary is the argument for standardizing the
name for the macro and sticking it into, for instance, <stddef.h>.
If we didn't have offsetof there, we might have to deal with
OFFSETOF, offsetof, offset, member_offset, and others.

#define offsetof(a,b) (size_t) &( ((a*)0) -> b)

Which has undefined behavior, the deferencing of a null pointer.

Only if the implementation defines that behavior to be what we want, can
that be done. Most implementtions, that sort of behavior turns out to
work out, but it isn't mandated by the Standard.

As for the other point that was made, when looking at open source code,
every other program seems to contain macros like
MAX
ARRAYLEN
STREQ
But with assorted spellings (the first program I looked at today used
MZ_MAX).
However, every other program also seems to use typedefs to define their
own versions of INT32 etc, even with stdint.h type being available for
25 years.
So being in the standard is not enough if the official name is too ugly
or it is fiddly to type.

Tim Rentsch

2025-01-29 16:01:13 UTC

Post by Richard Damon

Post by Kaz Kylheku
You can define
#define arraysize (x) (sizeof (x) / sizeof ((x)[0))

You can, but you don't need to.

sizeof foo->bar.buf / *sizeof foo->bar.buf
is just irksome. Why do I have to say that thing twice,
to get its number of elements?

The naming being arbitrary is the argument for standardizing the
name for the macro and sticking it into, for instance, <stddef.h>.
If we didn't have offsetof there, we might have to deal with
OFFSETOF, offsetof, offset, member_offset, and others.

Can't it? The various versions I've seen, including mine, look
#define offsetof(a,b) (size_t) &( ((a*)0) -> b)

Which has undefined behavior, the deferencing of a null pointer.
Only if the implementation defines that behavior to be what we want,
can that be done. Most implementtions, that sort of behavior turns
out to work out, but it isn't mandated by the Standard.

Undefined behavior of the pointer dereference isn't the only
problem. Whatever comes out of the offsetof() macro has to be an
integer constant expression. To do that, the implementation needs
to take advantage of the provision in 6.6 p10 that allows an
implementation to accept other forms of constant expressions. In
fact the particular case of offsetof() taking advantage of this
provision, to create an integer constant expression, has been
confirmed in a response to a Defect Report (sorry, I don't remember
which one).

James Kuyper

2025-01-29 16:09:39 UTC

On Wed, 29 Jan 2025 11:45:47 +0000
...

Post by bart

Post by Tim Rentsch
That's a flawed analogy. A macro to compute the number of
elements in an array can be done in standard C. The
functionality of offsetof cannot be done in standard C, and
that's what it needs to be in the standard library.

#define offsetof(a,b) (size_t) &( ((a*)0) -> b)

The semantics of the "->" operator specify that

"The value is that of the named member of the object to which the first
expression points..." (6.5.2.3p4).

There can be no such object, because

"... a null pointer, is guaranteed to compare unequal to a pointer to
any object ..." (6.3.2.3p3).

Since there is no explicitly defined behavior for such an expression,
the behavior is implicitly undefined. On many platforms it will work
exactly as you expect, but not all.

Even on platforms where that part works, this code relies upon the
assumption that the result of that conversion will be the distance from
the beginning of the struct to the start of the specified object. That
seems to be based upon the assumption that a null pointer points at
address 0, and that addresses increase by one for each byte in the
object, and that the conversion to size_t converts a pointer value into
the corresponding address. All of those assumptions are valid on many
platforms, but none of them are guaranteed by the standard. "Any pointer
type may be converted to an integer type. Except as previously
specified, the result is implementation-defined." (6.3.2.3p6). So this
definition for the offsetof() macro, while a valid one on many
platforms, is not standard C.

That's why offsetof() is a standard macro with implementation-specific
expansion - on many platforms, the above expansion won't work.

Tim Rentsch

2025-01-29 16:18:18 UTC

Post by Kaz Kylheku
You can define
#define arraysize (x) (sizeof (x) / sizeof ((x)[0))

You can, but you don't need to.

sizeof foo->bar.buf / *sizeof foo->bar.buf
is just irksome. Why do I have to say that thing twice,
to get its number of elements?

The naming being arbitrary is the argument for standardizing the
name for the macro and sticking it into, for instance, <stddef.h>.
If we didn't have offsetof there, we might have to deal with
OFFSETOF, offsetof, offset, member_offset, and others.

Can't it? The various versions I've seen, including mine, look
#define offsetof(a,b) (size_t) &( ((a*)0) -> b)

That form of expression is not guaranteed to work, as other
responses have explained.

As for the other point that was made, when looking at open source
code, every other program seems to contain macros like
MAX
ARRAYLEN
STREQ
But with assorted spellings (the first program I looked at today used
MZ_MAX).
However, every other program also seems to use typedefs to define
their own versions of INT32 etc, even with stdint.h type being
available for 25 years.

Probably historical baggage. It was ten years after C89 that C99
added <stdint.h>, and probably five more years before people
started using C99 regularly. And it didn't help that Microsoft,
in their near-infinite wisdom, didn't ever really support C99,
and waited until C11 before doing an implementation conforming to
a current C standard. So the pre-C99 names had many years to
become entrenched, and after that there was no sufficiently
motivating reason to change them. If it ain't broke don't fix
it.

So being in the standard is not enough if the official name is
too ugly or it is fiddly to type.

Personally I think the type names added in <stdint.h> are both
ugly and almost always a bad choice for other reasons. That
said, I note that uint32_t and uint64_t have become nearly
ubiquitous (and also uint8_t, which is completely baffling, since
unsigned char can be used instead, along with some form of static
assertion for people who are overly obsessive).

James Kuyper

2025-01-24 13:56:34 UTC

Post by Alexis
Hi all,
JeanHeyd Meneide, a Project Editor for WG14, has just posted the results
"There is a clear preference for a lowercase keyword, here, though it is
not by the biggest margin. One would imagine that with the way we keep
standardizing things since C89 (starting with _Keyword and then adding a
header with a macro) that C folks would be overwhelmingly in favor of
simply continuing that style. The graph here, however, tells a different
story: while there’s a large contingency that clearly hates having
_Keyword by itself, it’s not the _Keyword + stdkeyword.h macro that
comes out on top! It’s just having a plain lowercase keyword, instead."

One of the most important goals of the C standard is backwards
compatibility. A lower case keyword would break any program that was
already using that keyword as a user-defined identifier. _Keyword avoids
that problem, because it's an identifier from the namespace reserved to
implementations. Therefore, any code already using that identifier for
some other purpose has undefined behavior anyway, so it's not a problem
as far as the C committee is concerned.

Kaz Kylheku

2025-01-24 20:24:16 UTC

One of the most important goals of the C standard is backwards
compatibility.

Backward compatibility matters in software.

People use C compiler applications to open text documents of type C.

All that matter is that there is a way to use their old document
with the new application.

It is almost purely a software matter, not requiring anything in the
specification.

The C++ people have already figured out this out, and are running
with it (like crazy).

It doesn't matter if the current language has a keyword "arraysize"
which breaks every program that uses it as the name of something
(goto label, struct member, variable, function ...) if
the language implementation has an option like -std=c11
under which that is not a keyword.

Post by James Kuyper
A lower case keyword would break any program that was

That's like saying that the existence of Office 356 breaks
every Word 97 document.

That's only if Office 365 loses the ability to work with such a
document; the mere existence of the new format /per se/ perpetrates no
such harm.

The problem with what I'm saying here is that it requires trust.

The people specifying the language have to abandon their grasp of
the reins of control on the compatibility issue and trust that
the implementors will handle it in good ways for the benefit of
their users.

The people specifying the language also have to accept that
the backward compatibility mechanism is not only out of their
control, but that it has implementation-specific manifestations:
the means by which an implementation is instructed to obey an
older dialect isn't specified in the standard because they have
decided that the manner of presenting a program for processing
by an implementation is out of the Scope.

Even if it were something that were somehow brought within the Scope,
the standard couldn't go as far as to give a requirement like
"a conforming impelmentation shall provide configurations for
accepting programs in the following historic dialects of C: [...]"
You just can't do that.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

James Kuyper

2025-01-25 01:32:32 UTC

...

Post by James Kuyper
One of the most important goals of the C standard is backwards
compatibility.

Backward compatibility matters in software.

C code is software, and so are C implementations, so I'm not sure what
that has to do with anything. The committee has explicitly stated a
priority between those two kinds of software: "existing user code
matters; existing implementations don't". That means it's OK for a new
version of the C standard to require a rewrite of C implementations, but
requiring a rewrite of existing code to work with the new standard is to
be avoided.

Post by Kaz Kylheku
People use C compiler applications to open text documents of type C.
All that matter is that there is a way to use their old document
with the new application.

And to be able to use it in a mode where it fully supports all of the
new features of the C implementation.

Post by Kaz Kylheku
It is almost purely a software matter, not requiring anything in the
specification.

The specification needs to make it clear what users can and cannot
expect of the implementation - because that kind of thing is within the
responsibilities of the C standard. They can expect that a new version
of C will seldom (ideally never, but that generally cannot be achieved)
intrude upon the space of names previously reserved for users, so they
won't have to change their code to continue working with the new
features of that version.

...

Post by Kaz Kylheku
It doesn't matter if the current language has a keyword "arraysize"
which breaks every program that uses it as the name of something
(goto label, struct member, variable, function ...) if
the language implementation has an option like -std=c11
under which that is not a keyword.

Backwards compatibility doesn't mean that you can still build your
program using an older version of the language. It means that you can
compile your code your code without change using the newest version of
the language. You only have to make changes to make use of new features
of the language, and you should be confident that you can do so without
worrying about some of the other new features breaking your existing code.
Backwards compatibility is just one of the goals of the C and C++
committees, but it is one of several highest priorities for both of
them. That means that one or another of those other high priorities
causes almost every version of either standard to contain a few features
that do break backwards compatibility. Still, when it is easily avoided,
it should be, and choosing a reserved identifier for new keywords is
dead easy. The concept of reserved identifiers was created precisely for
use in this manner.

Kaz Kylheku

2025-01-25 02:40:47 UTC

Post by James Kuyper
...

Post by James Kuyper
One of the most important goals of the C standard is backwards
compatibility.

Backward compatibility matters in software.

Ensuring existing user code keeps working doesn't have to be a
responsibility of ISO C.

Catering to that at the standard level might not be optimal,
because it causes the committe to excessively fret about
backward compatibility at the specification level.

Whereas implemetations can pretty easily support backward compatibility
across incompatible dialects.

ISO C has not remained perfectly backward comaptible, so why
pretend to be the keepers of backward compatibility?

- // comments can be used to write a C90 program that isn't valid C99.

- The old auto keyword now has a new meaning. Ancient programs
that used auto for automatic variables no longer work.

- The parameter list () meaning the same thing as (void)
can break programs.

It is the practice of implementations providing dialect options that
helps things go smoothly. They give developers more choice in regard to
the timeline for upgrading their code to newer dialects.

Backwards compatibility doesn't mean that you can still build your
program using an older version of the language.

That's not all it means, sure.

But If the compiler allows that, then it is definitely providing a
feature that can be called none other than backward compatibility.

It's just a feature of that program, not of the language; the language
isn't backward compatible. Just the translator speaks multiple
languages.

I get the point that if the changes at the language level are so rampant
that users have to keep using compilers in backward compatibility modes
for extended periods of time due to facing too much effort (and repeated
effort) at making their programs work with newer versions of the
language, that is a poor situation.

Post by James Kuyper
It means that you can
compile your code your code without change using the newest version of
the language. You only have to make changes to make use of new features
of the language, and you should be confident that you can do so without
worrying about some of the other new features breaking your existing code.

What matters is how much it breaks. Renaming an identifier in ten
places, versus thousands of nontrivial changes.

People have ported C code to C++, resolving conflicts on the use
of identifiers like new, class, private, ... and life went on.

I suspect that the ISO C people are so hell bent on backward
compatibility because they want everyone as much as possible to to use
the shiniest, newest standard; they don't want an ecosystem in which
people just tell their implementations to use an older dialect and
ignore the new thing.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

James Kuyper

2025-01-25 05:06:23 UTC

...

Post by James Kuyper
C code is software, and so are C implementations, so I'm not sure what
that has to do with anything.
The committee has explicitly stated a
priority between those two kinds of software: "existing user code
matters; existing implementations don't".

Ensuring existing user code keeps working doesn't have to be a
responsibility of ISO C.

True. Making it the responsibility of C was a decision made by the C
committee, not something they had to do. That decision makes a key part
of C's identity. If you don't like that decision, I strongly recommend
choosing a language managed by an organization that attaches less
importance to backwards compatibility.

...

Post by Kaz Kylheku
ISO C has not remained perfectly backward comaptible, so why
pretend to be the keepers of backward compatibility?

They don't. They "pretend" to be people who place a high, but not
absolute, priority on maintaining backwards compatibility. In fact, they
"pretend" it so well that it's actually the reality.

...

Post by James Kuyper
Backwards compatibility doesn't mean that you can still build your
program using an older version of the language.

That's not all it means, sure.

It's not what it means at all. Backwards compatibility is a relationship
between two different versions of the standard. If one of those two
versions is not, in any way, involved, the concept is meaningless. When
you use a modern compiler that has a mode where it can compile C2023
code, but use it in a different mode where it supports C90, it isn't a
C2023 compiler that's backwards compatible with C90. In that mode, it is
simply a C90 compiler.

Tim Rentsch

2025-01-29 10:13:17 UTC

Post by Alexis
Hi all,
JeanHeyd Meneide, a Project Editor for WG14, has just posted the
results of a survey re. the preferred form of a new array size
"There is a clear preference for a lowercase keyword, here, though
it is not by the biggest margin. One would imagine that with the
way we keep standardizing things since C89 (starting with _Keyword
and then adding a header with a macro) that C folks would be
overwhelmingly in favor of simply continuing that style. The
graph here, however, tells a different story: while there's a
large contingency that clearly hates having _Keyword by itself,
it's not the _Keyword + stdkeyword.h macro that comes out on top!
It's just having a plain lowercase keyword, instead."

One of the most important goals of the C standard is backwards
compatibility.

Backward compatibility matters in software.
People use C compiler applications to open text documents of type C.
All that matter is that there is a way to use their old document
with the new application.
It is almost purely a software matter, not requiring anything in
the specification.
The C++ people have already figured out this out, and are running
with it (like crazy).
It doesn't matter if the current language has a keyword "arraysize"
which breaks every program that uses it as the name of something
(goto label, struct member, variable, function ...) if
the language implementation has an option like -std=c11
under which that is not a keyword.

Post by James Kuyper
A lower case keyword would break any program that was

It's hard to imagine a stance more antithetical to the point of
having a C standard in the first place.

Post by Kaz Kylheku
The people specifying the language also have to accept that
the backward compatibility mechanism is not only out of their
the means by which an implementation is instructed to obey an
older dialect isn't specified in the standard because they have
decided that the manner of presenting a program for processing
by an implementation is out of the Scope.
Even if it were something that were somehow brought within the
Scope, the standard couldn't go as far as to give a requirement
like "a conforming impelmentation shall provide configurations
[...]" You just can't do that.

These comments serve to underscore just how bad a decision it is
to add this unnecessary feature to the C standard.

Waldek Hebisch

2025-01-24 23:13:04 UTC

One of the most important goals of the C standard is backwards
compatibility. A lower case keyword would break any program that was
already using that keyword as a user-defined identifier.

Lower case _reserved word_ would break compatibility. But in most
cases there is no need to reserve a keyword: simply treat it as
predefined identifier with magic meaning. I user want gives it
different meaning, the new meaning would be used instead of
predefiend one.

Of course implementation could offer more choices, like removing
predefined meaning of specific indentifier (but allowing use of
rest of new stuff), warning about use as identifier, etc.

Standard could possibly add a pragma to disable specific predefined
identifiers or reserved words.

--
Waldek Hebisch

Kaz Kylheku

2025-01-25 01:17:28 UTC

Post by Waldek Hebisch
Lower case _reserved word_ would break compatibility. But in most
cases there is no need to reserve a keyword: simply treat it as
predefined identifier with magic meaning. I user want gives it
different meaning, the new meaning would be used instead of
predefiend one.

char *if = "eth0";

if (uid == 0) ... // error, if has been redefined!

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

James Kuyper

2025-01-25 01:57:05 UTC

...

Post by James Kuyper
One of the most important goals of the C standard is backwards
compatibility. A lower case keyword would break any program that was
already using that keyword as a user-defined identifier.

Unless the syntax that contains the new keyword is such that a
user-defined identifier could never occur in the same location, I don't
see how that would work.

In this particular case, countof(array) certainly does NOT qualify. It's
meant to be used wherever an integer value for the length of the array
is needed. And in any such context, a user-named function returning an
integer value or a user-named macro expanding to an expression with an
integer value is also allowed (in some contexts, only the latter would
be a possibility).

Waldek Hebisch

2025-01-25 21:18:34 UTC

Post by James Kuyper
...

Unless the syntax that contains the new keyword is such that a
user-defined identifier could never occur in the same location, I don't
see how that would work.
In this particular case, countof(array) certainly does NOT qualify. It's
meant to be used wherever an integer value for the length of the array
is needed. And in any such context, a user-named function returning an
integer value or a user-named macro expanding to an expression with an
integer value is also allowed (in some contexts, only the latter would
be a possibility).

AFAICS there is no trouble. Namely, before C23 compiler had to
handle undeclared function using "implicit int" rule. So it had
to distingush between declartations, function calls to declared
functions and function calls to undeclared functions. The
new rule informally could be:

whenever pre-C23 compiler would use implicit int rule and the
function name is 'countof' use new semantics of 'countof'

There is some complication: Presumably 'countof' should be
applicable to types. AFAICS when parser sees 'countof' with no
prior declaration it can activate mode of allowing both
expressions and types. This should be no big burden as already
'sizeof' gets similar treatment. So parsing could go on
with minor tweak and the rest can be handled at semantic level.

AFAICS what I describe is compatible extention: it will treat
any valid C23 program as before and only assign meaning to
previously invalid programs. At it should be implementable
with modest effort in any C compiler.

Of course, it would break compatibility for C compilers that
want to allow undeclared functions as an extention to C23,
but IIUC this is not a concern to the standard.

--
Waldek Hebisch

Keith Thompson

2025-01-26 00:28:58 UTC

***@fricas.org (Waldek Hebisch) writes:
[...]

Post by Waldek Hebisch
AFAICS there is no trouble. Namely, before C23 compiler had to
handle undeclared function using "implicit int" rule.

You're off by 24 years. C99 dropped the "implicit int" rule.

Post by Waldek Hebisch
So it had
to distingush between declartations, function calls to declared
functions and function calls to undeclared functions. The
whenever pre-C23 compiler would use implicit int rule and the
function name is 'countof' use new semantics of 'countof'

It would be pre-C2Y. C23 is frozen, and "countof won't be added to it.

Post by Waldek Hebisch
There is some complication: Presumably 'countof' should be
applicable to types. AFAICS when parser sees 'countof' with no
prior declaration it can activate mode of allowing both
expressions and types. This should be no big burden as already
'sizeof' gets similar treatment. So parsing could go on
with minor tweak and the rest can be handled at semantic level.

The syntax for sizeof applied to a type is different from the syntax
when applied to an expression:

sizeof unary-expression
sizeof ( type-name )

Of course a unary-expression can be a parenthesized expression, but
there's no ambiguity.

Post by Waldek Hebisch
AFAICS what I describe is compatible extention: it will treat
any valid C23 program as before and only assign meaning to
previously invalid programs. At it should be implementable
with modest effort in any C compiler.
Of course, it would break compatibility for C compilers that
want to allow undeclared functions as an extention to C23,
but IIUC this is not a concern to the standard.

I personally dislike the idea of making countof a reserved identifier
rather than a keyword. It makes the rules around it more complicated,
which makes the language harder to understand. And I'm not convinced it
avoids breaking old code. If the point is to allow for old code that
uses countof as an identifier, then this valid pre-C2Y code:

size_t countof(int *ptr);
int arr[10];
countof(arr);

would change its meaning (though the existing function couldn't actually
determine the number of element in the array).

I understand the need for backward compatibility, but most new editions
of the C standard has broken *some* existing code by defining new
lowercase keywords, starting with inline and restrict in C99. (C11
added only _Keywords, but C23 adds 11 new lowercase keywords.)

I doubt that there's all that much existing code that uses "countof" as
an identifier, and that can be dealt with by compiling it in C23 mode or
earlier.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Waldek Hebisch

2025-01-26 01:48:38 UTC

Post by Keith Thompson
[...]

Post by Waldek Hebisch
AFAICS there is no trouble. Namely, before C23 compiler had to
handle undeclared function using "implicit int" rule.

You're off by 24 years. C99 dropped the "implicit int" rule.

It would be pre-C2Y. C23 is frozen, and "countof won't be added to it.

The syntax for sizeof applied to a type is different from the syntax
sizeof unary-expression
sizeof ( type-name )
Of course a unary-expression can be a parenthesized expression, but
there's no ambiguity.

Above 'countof' is declared, so with the rule above 'countof(arr)'
would be a function call as before (that the whole point of the
rule, it fires only when use of 'countof' _is not_ valid as
function call).

Post by Keith Thompson
I understand the need for backward compatibility, but most new editions
of the C standard has broken *some* existing code by defining new
lowercase keywords, starting with inline and restrict in C99. (C11
added only _Keywords, but C23 adds 11 new lowercase keywords.)
I doubt that there's all that much existing code that uses "countof" as
an identifier, and that can be dealt with by compiling it in C23 mode or
earlier.

Some things need to be reserved, for example new declaration keywords.
To remain sane we do want to reserve things like 'if' evan if
tachnically it would be possible to allow redefinition (I did not
check if non-resered 'if' would still allow unambigious parsing,
simply idea of redefining 'if' is too crazy). But 'sizeof',
'alignof' and similar could be 'non-reserved'. 'sizeof' was
reserved for long time, so there no need to make is non-reserved,
but for new keywords there is compatibility gain. Basically,
"do not break things without need". Or to put it differently,
'countof' is a little non-essential nicety. While it would not
break much code, gain for it is small, so it is not clear that
is justfies any breakage.

--
Waldek Hebisch

Tim Rentsch

2025-01-29 10:31:36 UTC

Saying the same mistake has been made in the past is not an
argument for making it again in the future. If anything the bar
should be raised each time, not lowered.

Kaz Kylheku

2025-01-24 19:45:21 UTC

The best way to have versioning for this in a C compiler is a
language dialect selection option. Say that C27 (or whatever year)
adds an "arraysize" keyword.

- To use that keyword in the updated gcc, you use -std=c27.

- To compile programs written for C99, use -std=c99. Those
programs can use arraysize as an identifier.

- To move those programs to C27, you fix all the uses of
that identifier.

I don't see why we need _Keyword cruft with a header which provides
a macro alias, when compilers not only can have dialect selection
options but realistically cannot avoid doing so.

I hope the propsed array size keyword raises a constraint violation when
applied to a non-array. GCC had to develop a diagnostic for when sizeof
ptr is divided by sizeof ptr[0]!

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Alexis

2025-01-24 22:39:34 UTC

Post by Kaz Kylheku
The best way to have versioning for this in a C compiler is a
language dialect selection option.

Indeed, the article links to a PDF of slides, "Pitch for #dialect
directive" (N3407):

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3407.pdf

Alexis.

Kaz Kylheku

2025-01-25 01:16:07 UTC

Post by Alexis

Post by Kaz Kylheku
The best way to have versioning for this in a C compiler is a
language dialect selection option.

Indeed, the article links to a PDF of slides, "Pitch for #dialect
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3407.pdf

That's completely silly.

A source file's dialect preference could be easily communicated
by #define-ing a certain magic macro.

#define _STD_DIALECT "... some format in here ..."

If that is not appealing, it could be a #pragma:

#pragma dialect "..."

There is no need whatsoever to invent a new directive for this,
or anything else that is not ... an extension of preprocessing!

If it were an numeric value of type intmax_t, then the implementation's
header files could use ordinary preprocessing conditionals to select
dialect-specific definitions.

Dialects could be defined by 7 arguments, typically a combination
of character and integer constants:

E.g.

// GNU C, accepted by GCC 11.1.0

#define _STD_DIALECT _STD_MKDIALECT('G','N','U','C',11,1,0)

The fifth argument is 0 to 32767. The others are 0 to 127.

Standard dialecgts could be identified like this:

// Standard C from May 11, 2027.
#define _STD_DIALECT _STD_MKDIALECT('S','T','D','C',2027,5,11)

Dialect integers can easily be tested. An implementation could
test the first four byte to detect whether it supports that
family of dialects at all, and if so, it could switch things
based on the specific numbers. This is easy to do using
nothing but preprocessing, plus the compiler can peek at the
variable also make decisions internally.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Tim Rentsch

2025-01-29 10:17:05 UTC

Post by Alexis
Hi all,
JeanHeyd Meneide, a Project Editor for WG14, has just posted the
results of a survey re. the preferred form of a new array size
"There is a clear preference for a lowercase keyword, here, though
it is not by the biggest margin. One would imagine that with the
way we keep standardizing things since C89 (starting with _Keyword
and then adding a header with a macro) that C folks would be
overwhelmingly in favor of simply continuing that style. The graph
here, however, tells a different story: while there's a large
contingency that clearly hates having _Keyword by itself, it's not
the _Keyword + stdkeyword.h macro that comes out on top! It's just
having a plain lowercase keyword, instead."

The best way to have versioning for this in a C compiler is a
language dialect selection option. [...]

The best way is not to add this unnecesary feature to the standard
at all.

Tim Rentsch

2025-01-29 10:19:59 UTC

Post by Alexis
Hi all,
JeanHeyd Meneide, a Project Editor for WG14, has just posted the
results of a survey re. the preferred form of a new array size
operator: [...]

Sadly it is becoming more and more apparent that the C standard
committee has been infected by the C++ disease.

(My thanks to Alexis for keeping the group informed on this
matter.)

Ben Bacarisse

2025-01-29 16:00:54 UTC

Post by Alexis
JeanHeyd Meneide, a Project Editor for WG14, has just posted the results
-- https://thephd.dev/the-big-array-size-survey-for-c-results

Curious. The top objection to the usual macro solution is given as:

* double-evaluation of e.g. getting the size of the 1-d part of a 2-d
array int meow[3][4]; /* ... */ SIZE_KEYWORD(meow[first_idx()]);

Does the author not know that there is no evaluation of the operands of
sizeof in this example?

His "About" pages says "Project Editor for ISO/IEC JTC1 SC22 WG14 -
Programming Languages, C".

--
Ben.

David Brown

2025-01-29 17:01:00 UTC

Post by Alexis
JeanHeyd Meneide, a Project Editor for WG14, has just posted the results
-- https://thephd.dev/the-big-array-size-survey-for-c-results

* double-evaluation of e.g. getting the size of the 1-d part of a 2-d
array int meow[3][4]; /* ... */ SIZE_KEYWORD(meow[first_idx()]);
Does the author not know that there is no evaluation of the operands of
sizeof in this example?

6.5.3.4p2 :

"""
If the type of the operand is a variable length array type, the operand
is evaluated; otherwise, the operand is not evaluated and the result is
an integer constant.
"""

I don't know if that is the source of the double-evaluation concern
here, but it is certainly a situation in which sizeof /does/ evaluate
its operand.

Post by Ben Bacarisse
His "About" pages says "Project Editor for ISO/IEC JTC1 SC22 WG14 -
Programming Languages, C".

Ben Bacarisse

2025-01-30 00:31:45 UTC

Post by Alexis
JeanHeyd Meneide, a Project Editor for WG14, has just posted the results
-- https://thephd.dev/the-big-array-size-survey-for-c-results

"""
If the type of the operand is a variable length array type, the operand is
evaluated; otherwise, the operand is not evaluated and the result is an
integer constant.
"""
I don't know if that is the source of the double-evaluation concern here,
but it is certainly a situation in which sizeof /does/ evaluate its
operand.

It would have been a good idea to pick an example that behaves as
claimed. Let's hope this sort of casual approach is reserved for blogs.

--
Ben.

David Brown

2025-01-30 09:59:43 UTC

Post by Alexis
JeanHeyd Meneide, a Project Editor for WG14, has just posted the results
-- https://thephd.dev/the-big-array-size-survey-for-c-results

It would have been a good idea to pick an example that behaves as
claimed. Let's hope this sort of casual approach is reserved for blogs.

Agreed.

I have yet to see a serious example of VLA use where the evaluation of
the operand is actually relevant - I don't count VLA declarations of
size "rand()" as realistic.

It is possible to construct an example where a macro akin to his
SIZE_KEYWORD could result in a double evaluation when a VLA is used.
But you have to try harder than the blog author did. For example :

int foo(int n) {
int meow[3][4][n];
return SIZE_KEYWORD(meow[first_idx()]);
}

I needed a three dimensional VLA to get a double evaluation of first_idx().

His type-safety argument is a lot stronger IMHO. Simple "ARRAY_SIZE"
macros are happy with some pointers rather than requiring arrays, while
a language operator would give compile-time errors if you made a mistake
like :

int foo(int arr[]) {
size_t n = ARRAY_SIZE(arr);
...

gcc (and presumably other compilers) can warn about at least some such
code errors, and gcc extensions let you write a more advanced ARRAY_SIZE
like the one defined in Linux.

IMHO a more useful approach would have been to standardise the gcc
extension used to make the Linux macro (__builtin_types_compatible_p).
This can be useful in many other circumstances. Then it could be used
in a macro defined in a standard header for the C library so that other
programmers don't have to figure it out for themselves. And it should
be "cheaper" to add a new standard header and a macro than a new
keyword, with less potential for conflict with existing code.

(I believe there are a variety of other ways to make a safe array_size
macro using gcc extensions.)

Tim Rentsch

2025-01-30 20:13:11 UTC

Post by David Brown

Post by Alexis
JeanHeyd Meneide, a Project Editor for WG14, has just posted the
results of a survey re. the preferred form of a new array size
-- https://thephd.dev/the-big-array-size-survey-for-c-results

* double-evaluation of e.g. getting the size of the 1-d part of
a 2-d array
int meow[3][4]; /* ... */ SIZE_KEYWORD(meow[first_idx()]);
Does the author not know that there is no evaluation of the
operands of sizeof in this example?

"""
If the type of the operand is a variable length array type, the
operand is evaluated; otherwise, the operand is not evaluated and
the result is an integer constant.
"""
I don't know if that is the source of the double-evaluation concern
here, but it is certainly a situation in which sizeof /does/
evaluate its operand.

It would have been a good idea to pick an example that behaves as
claimed. Let's hope this sort of casual approach is reserved for blogs.

All of the motivational examples listed are lame. Everyone knows
that macro calls might evaluate an argument twice, and so to avoid
calling macros on expressions with side-effects. It's true that
the usual macro definition to determine array extent misbehaves
but that can simply be called out as a warning without needing to
codify the situation by putting it in the C standard; in other
words it's a quality of implementation issue, not a language issue
(and some cases are already diagnosed by both gcc and clang). As
for the problem of name collision, choosing a longer name and
having it be all caps (as most macro names should be) gives a
collision cross section that is vanishingly small. Either of the
names ARRAY_INDEX_LIMIT() or ARRAY_EXTENT() are both descriptive
enough and unlikely-to-collide enough that they can be used
without any significant danger of collision.

What would be better is to give some general tools that would
allow a user-defined macro to be written safely. For example:

_Has_array_type( e ) 1 if and only if the expression
'e' has array type

_Is_side_effect_free( e ) 1 if and only if the expression
'e' has no side effects, so
multiple evaluations have no
negative consequences

Furthermore because these tests are likely to be called only
inside macro definitions, using the _Leading_capital style of
naming shouldn't be a problem.

Scott Lurndal

2025-01-30 21:33:40 UTC

Post by David Brown

* double-evaluation of e.g. getting the size of the 1-d part of
a 2-d array
int meow[3][4]; /* ... */ SIZE_KEYWORD(meow[first_idx()]);
Does the author not know that there is no evaluation of the
operands of sizeof in this example?

"""
If the type of the operand is a variable length array type, the
operand is evaluated; otherwise, the operand is not evaluated and
the result is an integer constant.
"""
I don't know if that is the source of the double-evaluation concern
here, but it is certainly a situation in which sizeof /does/
evaluate its operand.

It would have been a good idea to pick an example that behaves as
claimed. Let's hope this sort of casual approach is reserved for blogs.

All of the motivational examples listed are lame. Everyone knows
that macro calls might evaluate an argument twice, and so to avoid
calling macros on expressions with side-effects. It's true that
the usual macro definition to determine array extent misbehaves
but that can simply be called out as a warning without needing to
codify the situation by putting it in the C standard; in other
words it's a quality of implementation issue, not a language issue
(and some cases are already diagnosed by both gcc and clang). As
for the problem of name collision, choosing a longer name and
having it be all caps (as most macro names should be) gives a
collision cross section that is vanishingly small. Either of the
names ARRAY_INDEX_LIMIT() or ARRAY_EXTENT() are both descriptive
enough and unlikely-to-collide enough that they can be used
without any significant danger of collision.
What would be better is to give some general tools that would
_Has_array_type( e ) 1 if and only if the expression
'e' has array type
_Is_side_effect_free( e ) 1 if and only if the expression
'e' has no side effects, so
multiple evaluations have no
negative consequences
Furthermore because these tests are likely to be called only
inside macro definitions, using the _Leading_capital style of
naming shouldn't be a problem.

Seems like a lot of cruft just to save a small bit of
typing by the programmer.

The decades old standard idiom of sizeof(x)/sizeof(x[0])
is self-documenting and requires no macros or new language
features.

Kaz Kylheku

2025-01-30 22:31:25 UTC

Post by Scott Lurndal
The decades old standard idiom of sizeof(x)/sizeof(x[0])
is self-documenting and requires no macros or new language
features.

I've seen the bug more than once whereby x was a pointer.

That's a threat whether you use a macro or open-code it,
of course.

GCC has developed a diagnostic for this; so whether you have
a macro or write it out as above, you care covered.

If a standard-defined operator or macro were available for this, it
could require a diagnostic in every conforming implementation,
when the argument isn't an array. Implementations that don't
have a diagnostic for sizeof(ptr)/sizeof(ptr[0]) would still
have to diagnose countof(ptr), or whatever it is called.

That would be a benefit in addition to saving keystrokes.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Tim Rentsch

2025-02-19 03:46:48 UTC

Post by David Brown