How to fix: [-Wformat-nonliteral]

Discussion:

How to fix: [-Wformat-nonliteral]

(too old to reply)

MehdiAmini

2017-04-07 07:50:25 UTC

Hi,

Compiling this function using Clang:

noreturn void my_err(const char * const s ,...)
{

va_list args;
va_start(args, s);

if( (s==NULL) || ( *s=='\0') )
{
fprintf(stderr,"Please provide a proper message for my_err
function.\n");
exit(EXIT_FAILURE);
}

vfprintf(stderr,s,args);

fprintf(stderr,"\n");

va_end(args);

exit(EXIT_FAILURE);
}

I get:
warning: format string is not a string literal [-Wformat-nonliteral]
vfprintf(stderr,s,args);
^

How to fix the warning?

--
www.my-c-codes.com/

Farewell.

Barry Schwarz

2017-04-07 08:28:42 UTC

Post by MehdiAmini
Hi,
noreturn void my_err(const char * const s ,...)
{
va_list args;
va_start(args, s);
if( (s==NULL) || ( *s=='\0') )
{
fprintf(stderr,"Please provide a proper message for my_err
function.\n");
exit(EXIT_FAILURE);
}
vfprintf(stderr,s,args);
fprintf(stderr,"\n");
va_end(args);
exit(EXIT_FAILURE);
}
warning: format string is not a string literal [-Wformat-nonliteral]
vfprintf(stderr,s,args);
^
How to fix the warning?

Specify -Wno-format-nonliteral

--
Remove del for email

David Brown

2017-04-07 09:58:21 UTC

Post by MehdiAmini
Hi,
noreturn void my_err(const char * const s ,...)
{
va_list args;
va_start(args, s);
if( (s==NULL) || ( *s=='\0') )
{
fprintf(stderr,"Please provide a proper message for my_err
function.\n");
exit(EXIT_FAILURE);
}
vfprintf(stderr,s,args);
fprintf(stderr,"\n");
va_end(args);
exit(EXIT_FAILURE);
}
warning: format string is not a string literal [-Wformat-nonliteral]
vfprintf(stderr,s,args);
^
How to fix the warning?

Another poster showed you how to disable the warning. But you might
also like to consider /why/ clang has this warning - and if you would
rather change the logic of your function instead of disabling the warning.

The format string of the printf functions is like a little specialised
interpreted language. Get the syntax wrong, or get a mismatch in the
arguments, and you can break all sorts of things in all sorts of ways.
This can be due to simple bugs, or intentional security breaches. And
in a function like this one, which will typically rarely be called
during testing, it is easy to get things wrong.

So it would be /much/ safer to have the function as:

noreturn void my_err_s(const char * const s)
{
if ((s == NULL) || (*s == '\0')) {
fprintf(stderr, "Please provide a proper message for "
"my_err function.\n");
exit(EXIT_FAILURE);
}
fprintf(stderr, "%s\n", s);
exit(EXIT_FAILURE);
}

#define my_errPrintfBuffLen 256
#define my_err(f, x...) \
do { \
char buff[my_errPrintfBuffLen]; \
snprintf(buff, sizeof(buff) - 1, f, ## x); \
buff[my_errPrintfBuffLen - 1] = 0; \
my_err_s(buff); \
} while (0)

This makes any formatting be handled locally in the function that calls
my_err, where the compiler can see the format string as a literal and
check the format against the actual parameters used. (There is, IIRC, a
slight gcc'ism in the token pasting here to get the commas right even if
my_err is called with no extra parameters. But I expect clang supports
the same feature.)

Ben Bacarisse

2017-04-07 10:43:53 UTC

<snip>

Post by David Brown

Post by MehdiAmini
vfprintf(stderr,s,args);

<snip>

Post by David Brown

Post by MehdiAmini
warning: format string is not a string literal [-Wformat-nonliteral]
vfprintf(stderr,s,args);

<snip>

Post by David Brown
Another poster showed you how to disable the warning.

<snip>

Post by David Brown
noreturn void my_err_s(const char * const s)
{
if ((s == NULL) || (*s == '\0')) {
fprintf(stderr, "Please provide a proper message for "
"my_err function.\n");
exit(EXIT_FAILURE);
}
fprintf(stderr, "%s\n", s);
exit(EXIT_FAILURE);
}
#define my_errPrintfBuffLen 256
#define my_err(f, x...) \
do { \
char buff[my_errPrintfBuffLen]; \
snprintf(buff, sizeof(buff) - 1, f, ## x); \
buff[my_errPrintfBuffLen - 1] = 0; \
my_err_s(buff); \
} while (0)
This makes any formatting be handled locally in the function that calls
my_err, where the compiler can see the format string as a literal and
check the format against the actual parameters used. (There is, IIRC, a
slight gcc'ism in the token pasting here to get the commas right even if
my_err is called with no extra parameters. But I expect clang supports
the same feature.)

There are two GNUisms -- one is the comma pasting trick to handle zero
variable arguments, and the other is naming the variable arguments as x.

In fact, you don't need either as there is always one required
argument. This is standard C99 and up:

#define my_err(...) \
do { \
char buff[my_errPrintfBuffLen]; \
snprintf(buff, sizeof(buff) - 1, __VA_ARGS__); \
buff[my_errPrintfBuffLen - 1] = 0; \
my_err_s(buff); \
} while (0)

--
Ben.

David Brown

2017-04-07 11:35:31 UTC

Post by Ben Bacarisse
<snip>

Post by David Brown

Post by MehdiAmini
vfprintf(stderr,s,args);

<snip>

Post by David Brown

Post by MehdiAmini
warning: format string is not a string literal [-Wformat-nonliteral]
vfprintf(stderr,s,args);

<snip>

Post by David Brown
Another poster showed you how to disable the warning.

<snip>

Post by David Brown
noreturn void my_err_s(const char * const s)
{
if ((s == NULL) || (*s == '\0')) {
fprintf(stderr, "Please provide a proper message for "
"my_err function.\n");
exit(EXIT_FAILURE);
}
fprintf(stderr, "%s\n", s);
exit(EXIT_FAILURE);
}
#define my_errPrintfBuffLen 256
#define my_err(f, x...) \
do { \
char buff[my_errPrintfBuffLen]; \
snprintf(buff, sizeof(buff) - 1, f, ## x); \
buff[my_errPrintfBuffLen - 1] = 0; \
my_err_s(buff); \
} while (0)
This makes any formatting be handled locally in the function that calls
my_err, where the compiler can see the format string as a literal and
check the format against the actual parameters used. (There is, IIRC, a
slight gcc'ism in the token pasting here to get the commas right even if
my_err is called with no extra parameters. But I expect clang supports
the same feature.)

There are two GNUisms -- one is the comma pasting trick to handle zero
variable arguments, and the other is naming the variable arguments as x.
In fact, you don't need either as there is always one required
#define my_err(...) \
do { \
char buff[my_errPrintfBuffLen]; \
snprintf(buff, sizeof(buff) - 1, __VA_ARGS__); \
buff[my_errPrintfBuffLen - 1] = 0; \
my_err_s(buff); \
} while (0)

Good point. The macro was a modified copy of something from my own
code, which in this case already has plenty of GNUisms - a couple more
won't affect portability! And while the OP here is using clang, which I
believe supports these extensions, removing them makes the code more
portable for other people.

Ben Bacarisse

2017-04-07 12:50:25 UTC

David Brown <***@hesbynett.no> writes:
<snip>

Post by David Brown
#define my_errPrintfBuffLen 256
#define my_err(f, x...) \
do { \
char buff[my_errPrintfBuffLen]; \
snprintf(buff, sizeof(buff) - 1, f, ## x); \
buff[my_errPrintfBuffLen - 1] = 0; \

I was also going to mention this. It seems a bit belt and braces. Is
there any problem with

snprintf(buff, sizeof(buff), f, ## x); \

and no extra "buff[my_errPrintfBuffLen - 1] = 0;"?

Incidentally, mixing the use of my_errPrintfBuffLen and sizeof buff is
less the ideal. I'd either use the defined constant only in the
declaration and sizeof buff - 1 in both other places, or, if I felt the
need to avoid confusing people with sizeof, I'd use the defined constant
everywhere.

Post by David Brown
my_err_s(buff); \
} while (0)

I might even just write

#define my_err(...) \
do { \
char buff[256]; \
snprintf(buff, sizeof(buff), __VA_ARGS__); \
my_err_s(buff); \
} while (0)

<snip>

--
Ben.

David Brown

2017-04-07 13:54:18 UTC

Post by Ben Bacarisse
<snip>

Post by David Brown
#define my_errPrintfBuffLen 256
#define my_err(f, x...) \
do { \
char buff[my_errPrintfBuffLen]; \
snprintf(buff, sizeof(buff) - 1, f, ## x); \
buff[my_errPrintfBuffLen - 1] = 0; \

I was also going to mention this. It seems a bit belt and braces. Is
there any problem with
snprintf(buff, sizeof(buff), f, ## x); \
and no extra "buff[my_errPrintfBuffLen - 1] = 0;"?

Looks like you are right - snprintf guarantees to put a null character
at the end. The history of this macro in my code stretches back to use
with older compilers from long ago, where I was less confident in the
accuracy of the C library implementation. I can't remember if I
actually found a problem with a missing termination, or merely thought
it was a possibility.

Post by Ben Bacarisse
Incidentally, mixing the use of my_errPrintfBuffLen and sizeof buff is
less the ideal. I'd either use the defined constant only in the
declaration and sizeof buff - 1 in both other places, or, if I felt the
need to avoid confusing people with sizeof, I'd use the defined constant
everywhere.

Either method would work - I view the two expressions as synonymous and
interchangeable. It might be a little different if the uses were
separated over a longer distance in code rather than being adjacent lines.

And I know that some people think it is nice to write "sizeof(buff)",
others prefer "sizeof buff". I'd rather not argue too much about style
here.

Post by Ben Bacarisse

Post by David Brown
my_err_s(buff); \
} while (0)

I might even just write
#define my_err(...) \
do { \
char buff[256]; \
snprintf(buff, sizeof(buff), __VA_ARGS__); \
my_err_s(buff); \
} while (0)

That would be fine.

Ben Bacarisse

2017-04-07 18:35:38 UTC

Post by David Brown

Post by Ben Bacarisse
<snip>

Post by David Brown
#define my_errPrintfBuffLen 256
#define my_err(f, x...) \
do { \
char buff[my_errPrintfBuffLen]; \
snprintf(buff, sizeof(buff) - 1, f, ## x); \
buff[my_errPrintfBuffLen - 1] = 0; \

I was also going to mention this. It seems a bit belt and braces. Is
there any problem with
snprintf(buff, sizeof(buff), f, ## x); \
and no extra "buff[my_errPrintfBuffLen - 1] = 0;"?

Looks like you are right - snprintf guarantees to put a null character
at the end. The history of this macro in my code stretches back to use
with older compilers from long ago, where I was less confident in the
accuracy of the C library implementation. I can't remember if I
actually found a problem with a missing termination, or merely thought
it was a possibility.

Yes, that was why I asked. The sn* family predate the C99 standard but
I don't recall any that would not generate a string (unless the size is
zero of course). But then my memory is not what it was, so I asked if
you had had reason to be suspicious.

Post by David Brown

Post by Ben Bacarisse
Incidentally, mixing the use of my_errPrintfBuffLen and sizeof buff is
less the ideal. I'd either use the defined constant only in the
declaration and sizeof buff - 1 in both other places, or, if I felt the
need to avoid confusing people with sizeof, I'd use the defined constant
everywhere.

Either method would work - I view the two expressions as synonymous and
interchangeable. It might be a little different if the uses were
separated over a longer distance in code rather than being adjacent lines.
And I know that some people think it is nice to write "sizeof(buff)",
others prefer "sizeof buff". I'd rather not argue too much about style
here.

Same here. I wrote sizeof buff because that's how I write it. I did
not intend to suggest there was any reason to change or re-hash that old
chestnut. The mixing of the constant and sizeof is slightly more than
style, though, since there's a small cognitive cost to checking the
constant. Even if defined just above, you need to check the spelling in
both usages just in case there's some other similar #define for
something else like my_logPrintBuffLen.

<snip>

--
Ben.

David Brown

2017-04-08 11:41:58 UTC

Post by Ben Bacarisse

Post by David Brown

Post by Ben Bacarisse
<snip>

Post by David Brown
#define my_errPrintfBuffLen 256
#define my_err(f, x...) \
do { \
char buff[my_errPrintfBuffLen]; \
snprintf(buff, sizeof(buff) - 1, f, ## x); \
buff[my_errPrintfBuffLen - 1] = 0; \

I was also going to mention this. It seems a bit belt and braces. Is
there any problem with
snprintf(buff, sizeof(buff), f, ## x); \
and no extra "buff[my_errPrintfBuffLen - 1] = 0;"?

Looks like you are right - snprintf guarantees to put a null character
at the end. The history of this macro in my code stretches back to use
with older compilers from long ago, where I was less confident in the
accuracy of the C library implementation. I can't remember if I
actually found a problem with a missing termination, or merely thought
it was a possibility.

Yes, that was why I asked. The sn* family predate the C99 standard but
I don't recall any that would not generate a string (unless the size is
zero of course). But then my memory is not what it was, so I asked if
you had had reason to be suspicious.

I can't remember the details of the tools I had at that time. But in
general, I have in the past used quite a number of compilers (and
libraries) that intentionally or unintentionally (i.e., toolchain bugs)
do not follow the rules of C. Some of the fun ones include:

Floating point only working for local variables, not statically
allocated variables.

Uninitialised data that is not zeroed before main().

"const" being treated as meaning "in flash", and using different
assembly instructions for access than ram data - thus casting a "char *"
pointer to a "const char *" pointer will break the program.

Integer promotion rules aimed at being "helpful" to the programmer,
rather than following the standards.

And of course, lots of bugs in compilers and libraries. Many of the
tools I have used have been small ones for odd microcontrollers - they
simply haven't had the levels of use and testing of "big" compilers.

But perhaps the reason I explicitly added a terminating 0 to that macro
was that when I wrote it, I didn't know the details of snprintf well enough!

Post by Ben Bacarisse

Post by David Brown

Post by Ben Bacarisse
Incidentally, mixing the use of my_errPrintfBuffLen and sizeof buff is
less the ideal. I'd either use the defined constant only in the
declaration and sizeof buff - 1 in both other places, or, if I felt the
need to avoid confusing people with sizeof, I'd use the defined constant
everywhere.

Either method would work - I view the two expressions as synonymous and
interchangeable. It might be a little different if the uses were
separated over a longer distance in code rather than being adjacent lines.
And I know that some people think it is nice to write "sizeof(buff)",
others prefer "sizeof buff". I'd rather not argue too much about style
here.

Same here. I wrote sizeof buff because that's how I write it. I did
not intend to suggest there was any reason to change or re-hash that old
chestnut. The mixing of the constant and sizeof is slightly more than
style, though, since there's a small cognitive cost to checking the
constant. Even if defined just above, you need to check the spelling in
both usages just in case there's some other similar #define for
something else like my_logPrintBuffLen.

Fair 'nuff.

s***@casperkitty.com

2017-04-10 20:52:25 UTC

Post by David Brown
I can't remember the details of the tools I had at that time. But in
general, I have in the past used quite a number of compilers (and
libraries) that intentionally or unintentionally (i.e., toolchain bugs)
Floating point only working for local variables, not statically
allocated variables.

I can't think of any implementations with that restriction, though I recall
one that didn't support floating-point types at all. It might possibly have
allowed floating-point computations in constant expressions, but if it did I
never used that feature. I can't think why only local variables could use
floating-point, though.

Post by David Brown
Uninitialised data that is not zeroed before main().

For some embedded systems, it's essential to have some storage which is
not zeroed before main, since various events may cause the system to
reset and re-execute main() from scratch but with the contents of memory
undisturbed. If a controller has a "power-on" hardware flag which gets
set by hardware whenever the voltage falls below a safe operational level
and can only be cleared by software when voltage is adequate for operation,
software can use that flag to manually initialize things that need to be
initialized.

The Standard regards as implementation-defined the nature and means by which
an embedded program starts operation. It allows systems to define alternate
function signatures and startup patterns; I don't know whether a conforming
implementation could refrain from initializing global variables before it
called "int main(void)", but I think it document the behavior of "void main()"
as being similar to "int main(void)" but without zero-initialization.

Post by David Brown
"const" being treated as meaning "in flash", and using different
assembly instructions for access than ram data - thus casting a "char *"
pointer to a "const char *" pointer will break the program.

I somewhat prefer the approach of using different qualifiers for things which
are guaranteed to be in RAM, guaranteed to be in code space, or might be in
either, but using "const" could be considered a practical approach even though
it is non-conforming.

Post by David Brown
Integer promotion rules aimed at being "helpful" to the programmer,
rather than following the standards.

The Standard allows implementations some flexibility to be helpful with
regard to signed integers. Unsigned integer types have rigidly-defined
behavior.

David Brown

2017-04-10 22:18:25 UTC

Post by s***@casperkitty.com

Post by David Brown
I can't remember the details of the tools I had at that time. But in
general, I have in the past used quite a number of compilers (and
libraries) that intentionally or unintentionally (i.e., toolchain bugs)
Floating point only working for local variables, not statically
allocated variables.

I can't think of any implementations with that restriction, though I recall
one that didn't support floating-point types at all. It might possibly have
allowed floating-point computations in constant expressions, but if it did I
never used that feature. I can't think why only local variables could use
floating-point, though.

It is highly unlikely that anyone would guess that situation - it was an
odd one, due mainly to issues with the hardware on the board that meant
access to external memory had problems, while the stack and local
variables were on the small internal memory on the device.

Post by s***@casperkitty.com

Post by David Brown
Uninitialised data that is not zeroed before main().

For some embedded systems, it's essential to have some storage which is
not zeroed before main, since various events may cause the system to
reset and re-execute main() from scratch but with the contents of memory
undisturbed. If a controller has a "power-on" hardware flag which gets
set by hardware whenever the voltage falls below a safe operational level
and can only be cleared by software when voltage is adequate for operation,
software can use that flag to manually initialize things that need to be
initialized.

If you are relying on the state of memory being preserved at the start
of main(), then you are working outside of standard C. There are three
common ways to deal with that. One is to use named sections for "no
initialisation" data areas for the variables of interest. Another is to
fake that by using manually assigned addresses to ram that is outside of
the areas used by C. And a third is to put the relevant C code in
sections or hooks for execution before the C runtime library clears the
bss and sets up the data section. These are, of course, all
implementation-specific - but they will be clear and reliable methods.

There are some compiler manufactures (such as Texas Instruments) that
seem to think it is a good idea not to zero out uninitialised data (the
bss section) before main(). Their idea is that this makes startup
faster, and that because their hardware watchdogs are often enabled out
of reset (a questionable idea at best), they don't zero the bss to avoid
a risk of the watchdog triggering before main() starts. It is a silly
idea, and has lead to much wailing and gnashing of teeth for developers
who find their programs behaving differently from run to run until they
learn of this "feature".

Post by s***@casperkitty.com
The Standard regards as implementation-defined the nature and means by which
an embedded program starts operation. It allows systems to define alternate
function signatures and startup patterns; I don't know whether a conforming
implementation could refrain from initializing global variables before it
called "int main(void)", but I think it document the behavior of "void main()"
as being similar to "int main(void)" but without zero-initialization.

They /could/ document it - but it would then not be C, because C
requires this zero initialisation (5.1.2, 6.7.9 p 10). In at least one
of TI's compiler manuals, I did see that it was documented - as a tiny
footnote, deep within the middle of the manual, with a comment that the
behaviour broke with the C standards.

Post by s***@casperkitty.com

Post by David Brown
"const" being treated as meaning "in flash", and using different
assembly instructions for access than ram data - thus casting a "char *"
pointer to a "const char *" pointer will break the program.

I somewhat prefer the approach of using different qualifiers for things which
are guaranteed to be in RAM, guaranteed to be in code space, or might be in
either, but using "const" could be considered a practical approach even though
it is non-conforming.

It was a terrible approach, because it silently broke code that used the
good programming practice of using const pointers when a function does
not change the data pointed at. It is far better to use explicit
extensions, such as a "flash" keyword, for that purpose.

Post by s***@casperkitty.com

Post by David Brown
Integer promotion rules aimed at being "helpful" to the programmer,
rather than following the standards.

The Standard allows implementations some flexibility to be helpful with
regard to signed integers. Unsigned integer types have rigidly-defined
behavior.

No, it does not allow such flexibility. If you have a uint8_t or an
int8_t, and you perform arithmetic, it is to be promoted to "int" before
carrying out the operation. A compiler that does not do that
(logically, at least - it does not actually have to do the 16+ bit
operations if only parts of the results are used) is a broken compiler
that causes confusion and means code that should be portable and
testable on different targets, no longer works.

In most cases, these things are done in the mistaken belief that the
compiler writer is helping users, or making the tool easy for beginners,
or making a more "natural" language for the particular target. But it
means that people who know what they are doing, are going to have
trouble because the compiler does not work the way they expect. And
people who /don't/ know what they are doing, are going to have trouble -
because they will refer to books, websites, and expert opinions which
are all incompatible with the "helpful" compiler.

s***@casperkitty.com

2017-04-10 22:43:30 UTC

Post by David Brown
There are some compiler manufactures (such as Texas Instruments) that
seem to think it is a good idea not to zero out uninitialised data (the
bss section) before main(). Their idea is that this makes startup
faster, and that because their hardware watchdogs are often enabled out
of reset (a questionable idea at best), they don't zero the bss to avoid
a risk of the watchdog triggering before main() starts. It is a silly
idea, and has lead to much wailing and gnashing of teeth for developers
who find their programs behaving differently from run to run until they
learn of this "feature".

The Keil/ARM approach to that issue is to call a function called SystemInit
before any static-duration variables are initialized, and then initialize
static-duration variables and call main() after that. I would expect that
it would be possible to have SystemInit invoke the main program loop itself
without returning in cases where initialization was known to have happened
previously and should not be repeated, but my preferred approach is to use
separate sections for things that have to survive a reset.

Post by David Brown

Post by s***@casperkitty.com
The Standard regards as implementation-defined the nature and means by which
an embedded program starts operation. It allows systems to define alternate
function signatures and startup patterns; I don't know whether a conforming
implementation could refrain from initializing global variables before it
called "int main(void)", but I think it document the behavior of "void main()"
as being similar to "int main(void)" but without zero-initialization.

They /could/ document it - but it would then not be C, because C
requires this zero initialisation (5.1.2, 6.7.9 p 10). In at least one
of TI's compiler manuals, I did see that it was documented - as a tiny
footnote, deep within the middle of the manual, with a comment that the
behaviour broke with the C standards.

I don't think the Standard allows compilers to regard as Undefined Behavior
the use of any entry-function signature other than those explicitly described.
If an implementation that didn't describe any behavior for "void main()"
would be allowed to treat it as UB, I see no reason that it shouldn't be
allowed to describe a useful behavior that differs from "int main()".

Post by David Brown
It was a terrible approach, because it silently broke code that used the
good programming practice of using const pointers when a function does
not change the data pointed at. It is far better to use explicit
extensions, such as a "flash" keyword, for that purpose.

The "const" pointers were treated as a "universal" pointer type that could
access things in either RAM or flash/ROM. A pointer type for things that
were known to be in flash would have improved efficiency in some cases,
though access through flash was generally slow enough that the extra time
to check whether a pointer was a RAM address wasn't a problem. The treatment
meant that a *const* was incompatible with a **, and meant that code which
needlessly applied "const" to pointers would run much more slowly than it
should, but maintained correct behavior in most cases.

Maintaining compatibility with 100% of code would require using two-byte
pointers for everything and generating a function call for every pointer
access. While requiring that code mark everyplace where a one-byte pointer
should be used might have worked, it would have required a lot of code
markup.

Post by David Brown

Post by s***@casperkitty.com

Post by David Brown
Integer promotion rules aimed at being "helpful" to the programmer,
rather than following the standards.

The Standard allows implementations some flexibility to be helpful with
regard to signed integers. Unsigned integer types have rigidly-defined
behavior.

No, it does not allow such flexibility. If you have a uint8_t or an
int8_t, and you perform arithmetic, it is to be promoted to "int" before
carrying out the operation. A compiler that does not do that
(logically, at least - it does not actually have to do the 16+ bit
operations if only parts of the results are used) is a broken compiler
that causes confusion and means code that should be portable and
testable on different targets, no longer works.

A compiler could just as well promote to an even longer type at its
convenience, except in contexts like "sizeof", since operations on the
longer type would behave identically to "int" in all cases where the
Standard would impose any requirements upon the latter. Further, nothing
in the Standard would forbid a compiler where "int" is 32 bits from treating

unsigned mul(unsigned short x, unsigned short y) { return x*y; }

as equivalent to:

unsigned mul(unsigned short x, unsigned short y) { return 1U*x*y; }

Indeed, looking at the C89 rationale, I think the authors of the Standard
would have been shocked at the notion that a compiler for two's-complement
silent-wraparound hardware should do otherwise except perhaps during a
pedantic sanitizing build.

Keith Thompson

2017-04-10 23:10:12 UTC

***@casperkitty.com writes:
[...]

Post by s***@casperkitty.com
I don't think the Standard allows compilers to regard as Undefined Behavior
the use of any entry-function signature other than those explicitly described.

I can't think of any interpretation of that statement that is both
reasonable and correct. Can you rephrase it, with careful consideration
to what the phrase "undefined behavior" actually means (N1570 3.4.3)?
If you're using the phrase "undefined behavior" in a sense other than
that defined by the standard, please don't bother explaining.

Post by s***@casperkitty.com
If an implementation that didn't describe any behavior for "void main()"
would be allowed to treat it as UB, I see no reason that it shouldn't be
allowed to describe a useful behavior that differs from "int main()".

If I understand that correctly, I agree. Using prototypes rather
than old-style declarations for greater clarity, if an implementation
doesn't define the behavior of `void main(void)`, then its behavior
is undefined -- and always, the actual behavior might be something
useful, whether it's documented/defined or not. An implementation
is free to define the behavior of `void main(void)` in any way it
likes (as long as it doesn't contradict the rest of the standard)
(5.1.2.2.1p1), and that behavior needn't match the behavior of
`int main(void)`.

(Where "the behavior of `void main(void)`" is verbal shorthand for "the
behavior of a program that defines `void main(void) { /* ... */ }`.)

[...]

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

s***@casperkitty.com

2017-04-10 23:26:54 UTC

Post by Keith Thompson

Post by s***@casperkitty.com
If an implementation that didn't describe any behavior for "void main()"
would be allowed to treat it as UB, I see no reason that it shouldn't be
allowed to describe a useful behavior that differs from "int main()".

If I understand that correctly, I agree. Using prototypes rather
than old-style declarations for greater clarity, if an implementation
doesn't define the behavior of `void main(void)`, then its behavior
is undefined -- and always, the actual behavior might be something
useful, whether it's documented/defined or not. An implementation
is free to define the behavior of `void main(void)` in any way it
likes (as long as it doesn't contradict the rest of the standard)
(5.1.2.2.1p1), and that behavior needn't match the behavior of
`int main(void)`.
(Where "the behavior of `void main(void)`" is verbal shorthand for "the
behavior of a program that defines `void main(void) { /* ... */ }`.)

That's my thinking. And I think the initialization behavior would only be
required before a call to one of the forms of "int main()" that would be
described by the Standard; I certainly don't think the authors of the
Standard wanted to forbid the approach used by the Keil compiler (with its
call to SystemInit() before variable initialization), and if SystemInit never
returns, the implementation won't ever call main().

David Brown

2017-04-11 09:33:10 UTC

Post by s***@casperkitty.com

Post by David Brown
There are some compiler manufactures (such as Texas Instruments) that
seem to think it is a good idea not to zero out uninitialised data (the
bss section) before main(). Their idea is that this makes startup
faster, and that because their hardware watchdogs are often enabled out
of reset (a questionable idea at best), they don't zero the bss to avoid
a risk of the watchdog triggering before main() starts. It is a silly
idea, and has lead to much wailing and gnashing of teeth for developers
who find their programs behaving differently from run to run until they
learn of this "feature".

The Keil/ARM approach to that issue is to call a function called SystemInit
before any static-duration variables are initialized, and then initialize
static-duration variables and call main() after that. I would expect that
it would be possible to have SystemInit invoke the main program loop itself
without returning in cases where initialization was known to have happened
previously and should not be repeated, but my preferred approach is to use
separate sections for things that have to survive a reset.

Such "SystemInit" hooks are usually used for configuring external
memory, setting up clocks, handling watchdog setup, etc. They /can/ be
used for dealing with data that needs preserved over restarts. But I
agree that separate sections are usually the most convenient method.

Post by s***@casperkitty.com

Post by David Brown

Post by s***@casperkitty.com
The Standard regards as implementation-defined the nature and means by which
an embedded program starts operation. It allows systems to define alternate
function signatures and startup patterns; I don't know whether a conforming
implementation could refrain from initializing global variables before it
called "int main(void)", but I think it document the behavior of "void main()"
as being similar to "int main(void)" but without zero-initialization.

They /could/ document it - but it would then not be C, because C
requires this zero initialisation (5.1.2, 6.7.9 p 10). In at least one
of TI's compiler manuals, I did see that it was documented - as a tiny
footnote, deep within the middle of the manual, with a comment that the
behaviour broke with the C standards.

I don't think the Standard allows compilers to regard as Undefined Behavior
the use of any entry-function signature other than those explicitly described.
If an implementation that didn't describe any behavior for "void main()"
would be allowed to treat it as UB, I see no reason that it shouldn't be
allowed to describe a useful behavior that differs from "int main()".

I can't figure out what you are trying to say here. We are not talking
about undefined behaviour here - we are talking about a non-conforming
implementation of the compiler.

Post by s***@casperkitty.com

Post by David Brown
It was a terrible approach, because it silently broke code that used the
good programming practice of using const pointers when a function does
not change the data pointed at. It is far better to use explicit
extensions, such as a "flash" keyword, for that purpose.

The "const" pointers were treated as a "universal" pointer type that could
access things in either RAM or flash/ROM.

C does not make a distinction between RAM or ROM. /All/ pointers in C
to a given type are "universal" pointers. A const pointer is a pointer
which the /programmer/ promises he will not use to modify data. Data
that is defined as const is stronger - there the compiler /knows/ that
the data will never be modified, and is allowed to place it in read-only
memory.

Of course, a compiler implementation can provide extensions beyond this
model - and perhaps limitations if necessary in order to keep an
efficient implementation. But a compiler implementation may not usurp
normal standard C keywords and usage for such different behaviour.

In this particular case, the microcontroller in question was an AVR.
This is an 8-bit device, and it uses different cpu instructions to
access memory in RAM from memory in Flash. The old compiler (I omit the
name to protect the guilty) was trying to make it convenient for casual
users to have data such as tables or strings in flash. In contrast, the
standard compiler for this chip is gcc which uses normal semantics and
normal standard C. This means that if you write "const int x = 1234;"
and take the address of x, then x will be placed in RAM - even though it
never changes. This is necessary to make pointers work properly, as
normal C. The compiler provides extensions to allow you to put data
specifically in flash and access it directly (including a type of
universal pointer that can access anything).

Post by s***@casperkitty.com
A pointer type for things that
were known to be in flash would have improved efficiency in some cases,
though access through flash was generally slow enough that the extra time
to check whether a pointer was a RAM address wasn't a problem. The treatment
meant that a *const* was incompatible with a **, and meant that code which
needlessly applied "const" to pointers would run much more slowly than it
should, but maintained correct behavior in most cases.
Maintaining compatibility with 100% of code would require using two-byte
pointers for everything and generating a function call for every pointer
access. While requiring that code mark everyplace where a one-byte pointer
should be used might have worked, it would have required a lot of code
markup.

Post by David Brown

Post by s***@casperkitty.com

Post by David Brown
Integer promotion rules aimed at being "helpful" to the programmer,
rather than following the standards.

The Standard allows implementations some flexibility to be helpful with
regard to signed integers. Unsigned integer types have rigidly-defined
behavior.

No, it does not allow such flexibility. If you have a uint8_t or an
int8_t, and you perform arithmetic, it is to be promoted to "int" before
carrying out the operation. A compiler that does not do that
(logically, at least - it does not actually have to do the 16+ bit
operations if only parts of the results are used) is a broken compiler
that causes confusion and means code that should be portable and
testable on different targets, no longer works.

A compiler could just as well promote to an even longer type at its
convenience, except in contexts like "sizeof", since operations on the
longer type would behave identically to "int" in all cases where the
Standard would impose any requirements upon the latter. Further, nothing
in the Standard would forbid a compiler where "int" is 32 bits from treating
unsigned mul(unsigned short x, unsigned short y) { return x*y; }
unsigned mul(unsigned short x, unsigned short y) { return 1U*x*y; }
Indeed, looking at the C89 rationale, I think the authors of the Standard
would have been shocked at the notion that a compiler for two's-complement
silent-wraparound hardware should do otherwise except perhaps during a
pedantic sanitizing build.

You don't need to repeat your favourite bugbear yet again - it is not
relevant here, any more than it is most of the times you bring it up.

s***@casperkitty.com

2017-04-11 15:37:56 UTC

Post by David Brown

Post by s***@casperkitty.com
I don't think the Standard allows compilers to regard as Undefined Behavior
the use of any entry-function signature other than those explicitly described.
If an implementation that didn't describe any behavior for "void main()"
would be allowed to treat it as UB, I see no reason that it shouldn't be
allowed to describe a useful behavior that differs from "int main()".

I can't figure out what you are trying to say here. We are not talking
about undefined behaviour here - we are talking about a non-conforming
implementation of the compiler.

If some compiler X doesn't document any particular behavior for "void main()"
but such code happens to behave like "int main()" but without variable
initialization, such behavior would be conforming since use of "void main()"
would be UB, making all possible actions conforming.

If some compiler Y were identical to X except that it *documented* that
"void main()" would work as it does in compiler X, then "void main()" would
not be UB on compiler Y, but I see no reason that compiler Y shouldn't be
conforming.

Post by David Brown

Post by s***@casperkitty.com
The "const" pointers were treated as a "universal" pointer type that could
access things in either RAM or flash/ROM.

C does not make a distinction between RAM or ROM. /All/ pointers in C
to a given type are "universal" pointers. A const pointer is a pointer
which the /programmer/ promises he will not use to modify data. Data
that is defined as const is stronger - there the compiler /knows/ that
the data will never be modified, and is allowed to place it in read-only
memory.

I'm well aware of that. The question is whether it's better to have a
compiler:

1. default to RAM-only pointers and require that all objects fit in
RAM unless they have a special qualifier that makes them only
accessible via specially-qualified pointers

2. default to universal pointers and require that any code that needs
to perform well with data in RAM use qualifiers to allow that, or
have the default behavior in the absence of vendor-specific
qualifiers depend upon whether things are "const" qualified. If I
were designing a compiler, I'd be inclined to allow command line
settings or #pragma directives select among such behaviors, since
there are situations where each could be most useful, but the
non-conforming option is probably the one which would allow the
largest amount of code to work efficiently without modification.

Post by David Brown
Of course, a compiler implementation can provide extensions beyond this
model - and perhaps limitations if necessary in order to keep an
efficient implementation. But a compiler implementation may not usurp
normal standard C keywords and usage for such different behaviour.

Compilers can do whatever they want. Such behavior may make them non-
conforming, but in some situations a non-conforming compiler might be more
useful than any fully-conforming compiler ever could be. For example, on
an 8-bit processor, a compiler that has a single 32-bit floating-point type
may be more useful than one which can't pass floating-point values to
variadic functions without bundling 48-bit-or-larger math code. If code
never needs anything beyond standard float precision, the extra machine code
for larger types would represent a waste of time and space.

Post by David Brown
In this particular case, the microcontroller in question was an AVR.
This is an 8-bit device, and it uses different cpu instructions to
access memory in RAM from memory in Flash. The old compiler (I omit the
name to protect the guilty) was trying to make it convenient for casual
users to have data such as tables or strings in flash. In contrast, the
standard compiler for this chip is gcc which uses normal semantics and
normal standard C. This means that if you write "const int x = 1234;"
and take the address of x, then x will be placed in RAM - even though it
never changes. This is necessary to make pointers work properly, as
normal C. The compiler provides extensions to allow you to put data
specifically in flash and access it directly (including a type of
universal pointer that can access anything).

That's certainly a reasonable approach, and a quality compiler should
probably offer it as an option, but for many purposes a non-conforming
option that works like other compilers do may be more useful.

Post by David Brown

Post by s***@casperkitty.com
A pointer type for things that
were known to be in flash would have improved efficiency in some cases,
though access through flash was generally slow enough that the extra time
to check whether a pointer was a RAM address wasn't a problem. The treatment
meant that a *const* was incompatible with a **, and meant that code which
needlessly applied "const" to pointers would run much more slowly than it
should, but maintained correct behavior in most cases.
Maintaining compatibility with 100% of code would require using two-byte
pointers for everything and generating a function call for every pointer
access. While requiring that code mark everyplace where a one-byte pointer
should be used might have worked, it would have required a lot of code
markup.

Post by David Brown

Post by s***@casperkitty.com

Post by David Brown
Integer promotion rules aimed at being "helpful" to the programmer,
rather than following the standards.

The Standard allows implementations some flexibility to be helpful with
regard to signed integers. Unsigned integer types have rigidly-defined
behavior.

No, it does not allow such flexibility. If you have a uint8_t or an
int8_t, and you perform arithmetic, it is to be promoted to "int" before
carrying out the operation. A compiler that does not do that
(logically, at least - it does not actually have to do the 16+ bit
operations if only parts of the results are used) is a broken compiler
that causes confusion and means code that should be portable and
testable on different targets, no longer works.

A compiler could just as well promote to an even longer type at its
convenience, except in contexts like "sizeof", since operations on the
longer type would behave identically to "int" in all cases where the
Standard would impose any requirements upon the latter....

You don't need to repeat your favourite bugbear yet again - it is not
relevant here, any more than it is most of the times you bring it up.

What about the other point (requoted above): given code like:

int64_t mul_and_add(int32_t x, int32_t y)
{
return x*y;
}

When generating code for a 32-bit processor that includes a 32x32->64
multiply instruction, the Standard would allow a compiler to, at its
leisure, either perform the multiply as 32x32->32 and sign-extend, or
else perform a 32x32->64 computation and return the result as-is? I
don't know of any compilers that would actually promise the latter,
but the Standard would certainly allow a compiler to do so. On a
processor which uses 32-bit "int" on a 64-bit processor, such a promise
would be rather cheap, though code would have to avoid expressions that
mixed uint32_t with int32_t or smaller types to avoid icky semantics.

David Brown

2017-04-12 22:20:30 UTC

Post by s***@casperkitty.com

Post by David Brown

Post by s***@casperkitty.com
I don't think the Standard allows compilers to regard as Undefined Behavior
the use of any entry-function signature other than those explicitly described.
If an implementation that didn't describe any behavior for "void main()"
would be allowed to treat it as UB, I see no reason that it shouldn't be
allowed to describe a useful behavior that differs from "int main()".

I can't figure out what you are trying to say here. We are not talking
about undefined behaviour here - we are talking about a non-conforming
implementation of the compiler.

If some compiler X doesn't document any particular behavior for "void main()"
but such code happens to behave like "int main()" but without variable
initialization, such behavior would be conforming since use of "void main()"
would be UB, making all possible actions conforming.
If some compiler Y were identical to X except that it *documented* that
"void main()" would work as it does in compiler X, then "void main()" would
not be UB on compiler Y, but I see no reason that compiler Y shouldn't be
conforming.

You are still making no sense. The implementation's way of handling
main() cannot be undefined behaviour - it is defined by the
implementation. It might not conform to the C standards, but it can
never be undefined behaviour.

Post by s***@casperkitty.com

Post by David Brown

Post by s***@casperkitty.com
The "const" pointers were treated as a "universal" pointer type that could
access things in either RAM or flash/ROM.

C does not make a distinction between RAM or ROM. /All/ pointers in C
to a given type are "universal" pointers. A const pointer is a pointer
which the /programmer/ promises he will not use to modify data. Data
that is defined as const is stronger - there the compiler /knows/ that
the data will never be modified, and is allowed to place it in read-only
memory.

I'm well aware of that. The question is whether it's better to have a
1. default to RAM-only pointers and require that all objects fit in
RAM unless they have a special qualifier that makes them only
accessible via specially-qualified pointers
2. default to universal pointers and require that any code that needs
to perform well with data in RAM use qualifiers to allow that, or
have the default behavior in the absence of vendor-specific
qualifiers depend upon whether things are "const" qualified. If I
were designing a compiler, I'd be inclined to allow command line
settings or #pragma directives select among such behaviors, since
there are situations where each could be most useful, but the
non-conforming option is probably the one which would allow the
largest amount of code to work efficiently without modification.

/You/ might be inclined to allow weird behaviour, but we already know
you have strange ideas about what C is, was or should be.

The question /really/ is, is it better to have a compiler:

1. Accept normal C and work the way everyone expects a C compiler to
work, allowing good and safe code to work correctly. For some bits of
code, extension keywords or other implementation-specific features must
be used for optimal efficiency due to limitations of the target
architecture.

or

2. Change important keywords of C to work in a substantially different
way, breaking correct code, because you think it would be easier for
beginners to understand this non-standard variant of C.

Post by s***@casperkitty.com

Post by David Brown
Of course, a compiler implementation can provide extensions beyond this
model - and perhaps limitations if necessary in order to keep an
efficient implementation. But a compiler implementation may not usurp
normal standard C keywords and usage for such different behaviour.

Compilers can do whatever they want. Such behavior may make them non-
conforming, but in some situations a non-conforming compiler might be more
useful than any fully-conforming compiler ever could be. For example, on
an 8-bit processor, a compiler that has a single 32-bit floating-point type
may be more useful than one which can't pass floating-point values to
variadic functions without bundling 48-bit-or-larger math code. If code
never needs anything beyond standard float precision, the extra machine code
for larger types would represent a waste of time and space.

I agree that for some targets, it can be better to make some breakage to
C standards conformity. But that should be kept to the absolute
minimum, be well documented, not break working code silently, and
preferably require a command-line switch to activate it or give warnings
about the odd behaviour. It is reasonable, for example, for a C
implementation on a small 8-bit micro to make "double" limited to 32
bits. It is /not/ reasonable to change the meaning of a keyword like
"const" to mean something subtly different, or to skip the zeroing of
uninitialised data.

Post by s***@casperkitty.com

Post by David Brown
In this particular case, the microcontroller in question was an AVR.
This is an 8-bit device, and it uses different cpu instructions to
access memory in RAM from memory in Flash. The old compiler (I omit the
name to protect the guilty) was trying to make it convenient for casual
users to have data such as tables or strings in flash. In contrast, the
standard compiler for this chip is gcc which uses normal semantics and
normal standard C. This means that if you write "const int x = 1234;"
and take the address of x, then x will be placed in RAM - even though it
never changes. This is necessary to make pointers work properly, as
normal C. The compiler provides extensions to allow you to put data
specifically in flash and access it directly (including a type of
universal pointer that can access anything).

That's certainly a reasonable approach, and a quality compiler should
probably offer it as an option, but for many purposes a non-conforming
option that works like other compilers do may be more useful.

No, the "break the const keyword" approach is not more useful.

Post by s***@casperkitty.com

Post by David Brown

Post by s***@casperkitty.com
A pointer type for things that
were known to be in flash would have improved efficiency in some cases,
though access through flash was generally slow enough that the extra time
to check whether a pointer was a RAM address wasn't a problem. The treatment
meant that a *const* was incompatible with a **, and meant that code which
needlessly applied "const" to pointers would run much more slowly than it
should, but maintained correct behavior in most cases.
Maintaining compatibility with 100% of code would require using two-byte
pointers for everything and generating a function call for every pointer
access. While requiring that code mark everyplace where a one-byte pointer
should be used might have worked, it would have required a lot of code
markup.

Post by David Brown

Post by s***@casperkitty.com

Post by David Brown
Integer promotion rules aimed at being "helpful" to the programmer,
rather than following the standards.

The Standard allows implementations some flexibility to be helpful with
regard to signed integers. Unsigned integer types have rigidly-defined
behavior.

No, it does not allow such flexibility. If you have a uint8_t or an
int8_t, and you perform arithmetic, it is to be promoted to "int" before
carrying out the operation. A compiler that does not do that
(logically, at least - it does not actually have to do the 16+ bit
operations if only parts of the results are used) is a broken compiler
that causes confusion and means code that should be portable and
testable on different targets, no longer works.

A compiler could just as well promote to an even longer type at its
convenience, except in contexts like "sizeof", since operations on the
longer type would behave identically to "int" in all cases where the
Standard would impose any requirements upon the latter....

You don't need to repeat your favourite bugbear yet again - it is not
relevant here, any more than it is most of the times you bring it up.

<snip the needless repetition>

s***@casperkitty.com

2017-04-12 22:55:50 UTC

Post by David Brown

Post by s***@casperkitty.com
If some compiler Y were identical to X except that it *documented* that
"void main()" would work as it does in compiler X, then "void main()" would
not be UB on compiler Y, but I see no reason that compiler Y shouldn't be
conforming.

You are still making no sense. The implementation's way of handling
main() cannot be undefined behaviour - it is defined by the
implementation. It might not conform to the C standards, but it can
never be undefined behaviour.

The Standard defines the behavior for "int main(void)". Where does it define
any behavior for "void main(void)"?

Post by David Brown
1. Accept normal C and work the way everyone expects a C compiler to
work, allowing good and safe code to work correctly. For some bits of
code, extension keywords or other implementation-specific features must
be used for optimal efficiency due to limitations of the target
architecture.
or
2. Change important keywords of C to work in a substantially different
way, breaking correct code, because you think it would be easier for
beginners to understand this non-standard variant of C.

No implementation will be able to run all correct code. Each of the above
approaches will run some programs that the other would not. If some
particular program will run usefully on the second approach, but would be
unable to run using the first because of resource constraints, then I would
suggest the second implementation would be more useful for the purpose of
running that program.

Post by David Brown
I agree that for some targets, it can be better to make some breakage to
C standards conformity. But that should be kept to the absolute
minimum, be well documented, not break working code silently, and
preferably require a command-line switch to activate it or give warnings
about the odd behaviour. It is reasonable, for example, for a C
implementation on a small 8-bit micro to make "double" limited to 32
bits. It is /not/ reasonable to change the meaning of a keyword like
"const" to mean something subtly different, or to skip the zeroing of
uninitialised data.

Most code won't need to convert a const-qualified pointer to a non-qualified
pointer and then later convert back to a const-qualified pointer and access
ROM/flash. While a compiler should have an option to make all pointers use
the same format as const-qualified pointers, I would not expect that most
programs would work better with that option disabled.

Post by David Brown

Post by s***@casperkitty.com
That's certainly a reasonable approach, and a quality compiler should
probably offer it as an option, but for many purposes a non-conforming
option that works like other compilers do may be more useful.

No, the "break the const keyword" approach is not more useful.

If a program would work correctly with or with or without that option, but
be faster without, or if it would work correctly with that option but would
not fit in the available code space without it, in what way would that
option make the implementation more useful for purposes of running that
program?

If the Atmel can't use a const-qualified pointer to access RAM (I've not
used that platform, so I don't know if it can), that would be a horribly
broken abuse of the "const" keyword, but if "const"-qualified pointers
can access either while unqualified pointers are limited to RAM, most
programs would not be affected by an inability to round-trip const pointers
through unqualified pointers nor directly use unqualified pointers to
access ROM.

Keith Thompson

2017-04-13 01:17:03 UTC

Post by s***@casperkitty.com

Post by David Brown

Post by s***@casperkitty.com
If some compiler Y were identical to X except that it *documented* that
"void main()" would work as it does in compiler X, then "void main()" would
not be UB on compiler Y, but I see no reason that compiler Y shouldn't be
conforming.

You are still making no sense. The implementation's way of handling
main() cannot be undefined behaviour - it is defined by the
implementation. It might not conform to the C standards, but it can
never be undefined behaviour.

The Standard defines the behavior for "int main(void)". Where does it define
any behavior for "void main(void)"?

If "void main(void)" is documented as "some other implementation-defined
manner" as described in 5.1.2.2.1p1, its behavior (other than its
termination status) is defined by the entire standard.

[...]

Post by s***@casperkitty.com
No implementation will be able to run all correct code. Each of the above
approaches will run some programs that the other would not. If some
particular program will run usefully on the second approach, but would be
unable to run using the first because of resource constraints, then I would
suggest the second implementation would be more useful for the purpose of
running that program.

Any acceptable implementation will be useful for more than the purpose
of running one specific program.

There are ways to define extensions without voilating the standard, such
as "#pragma".

[...]

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

David Brown

2017-04-14 10:27:43 UTC

Post by s***@casperkitty.com

Post by David Brown

Post by s***@casperkitty.com
If some compiler Y were identical to X except that it *documented* that
"void main()" would work as it does in compiler X, then "void main()" would
not be UB on compiler Y, but I see no reason that compiler Y shouldn't be
conforming.

You are still making no sense. The implementation's way of handling
main() cannot be undefined behaviour - it is defined by the
implementation. It might not conform to the C standards, but it can
never be undefined behaviour.

The Standard defines the behavior for "int main(void)". Where does it define
any behavior for "void main(void)"?

I still don't get your hang-up about main() here, or your idea of the
implementation being undefined behaviour.

The standards make it clear that program lifetime objects without
explicit initialisation are zero'ed before the start of the program. A
freestanding C implementation can use whatever it likes as a start
function - it can be "void main(void)", or "int reset(int)", or whatever
it wants. That is for the implementation to decide. And if the
programmer uses a form that is not specified by the implementation, then
/that/ is undefined behaviour.

Post by s***@casperkitty.com

Post by David Brown
1. Accept normal C and work the way everyone expects a C compiler to
work, allowing good and safe code to work correctly. For some bits of
code, extension keywords or other implementation-specific features must
be used for optimal efficiency due to limitations of the target
architecture.
or
2. Change important keywords of C to work in a substantially different
way, breaking correct code, because you think it would be easier for
beginners to understand this non-standard variant of C.

No implementation will be able to run all correct code. Each of the above
approaches will run some programs that the other would not. If some
particular program will run usefully on the second approach, but would be
unable to run using the first because of resource constraints, then I would
suggest the second implementation would be more useful for the purpose of
running that program.

Clearly it is possible to write code that can be run by the second
implementation. And clearly it is possible to write code that /cannot/
be run by the first implementation (just use enough constant data to
overflow the ram in the microcontroller, avoiding the
implementation-specific features needed to keep that constant data in
flash). But with the second implementation, standard programs that
should be able to run (i.e., they are within the limitations of ram,
rom, etc.), will compile cleanly but have different and incorrect
behaviour compared to other conforming C implementations. And that is
the vital difference.

Post by s***@casperkitty.com

Post by David Brown
I agree that for some targets, it can be better to make some breakage to
C standards conformity. But that should be kept to the absolute
minimum, be well documented, not break working code silently, and
preferably require a command-line switch to activate it or give warnings
about the odd behaviour. It is reasonable, for example, for a C
implementation on a small 8-bit micro to make "double" limited to 32
bits. It is /not/ reasonable to change the meaning of a keyword like
"const" to mean something subtly different, or to skip the zeroing of
uninitialised data.

Most code won't need to convert a const-qualified pointer to a non-qualified
pointer and then later convert back to a const-qualified pointer and access
ROM/flash. While a compiler should have an option to make all pointers use
the same format as const-qualified pointers, I would not expect that most
programs would work better with that option disabled.

The problem is non-const data accessed through const qualified pointers.
And such code /is/ common:

extern int average(int noOfSamples, const int * pSamples);
int x[10];
int av = average(10, x);

Post by s***@casperkitty.com

Post by David Brown

Post by s***@casperkitty.com
That's certainly a reasonable approach, and a quality compiler should
probably offer it as an option, but for many purposes a non-conforming
option that works like other compilers do may be more useful.

No, the "break the const keyword" approach is not more useful.

If a program would work correctly with or with or without that option, but
be faster without, or if it would work correctly with that option but would
not fit in the available code space without it, in what way would that
option make the implementation more useful for purposes of running that
program?

A C compiler should not be made to work only with a particular
specialised variety of programs! A compiler sold as a "C Compiler"
should be made to correctly compile as many valid C programs as possible
given the limitations of the target.

Post by s***@casperkitty.com
If the Atmel can't use a const-qualified pointer to access RAM (I've not
used that platform, so I don't know if it can), that would be a horribly
broken abuse of the "const" keyword,

Exactly. (The limitation is for one particular compiler for the AVR,
not for all compilers for it.)

Post by s***@casperkitty.com
but if "const"-qualified pointers
can access either while unqualified pointers are limited to RAM, most
programs would not be affected by an inability to round-trip const pointers
through unqualified pointers nor directly use unqualified pointers to
access ROM.

Certainly that would be much less of a problem.

s***@casperkitty.com

2017-04-14 18:57:18 UTC

Post by David Brown
The standards make it clear that program lifetime objects without
explicit initialisation are zero'ed before the start of the program. A
freestanding C implementation can use whatever it likes as a start
function - it can be "void main(void)", or "int reset(int)", or whatever
it wants. That is for the implementation to decide. And if the
programmer uses a form that is not specified by the implementation, then
/that/ is undefined behaviour.

The notion of "the start of a program" can be somewhat murky in some free-
standing scenarios. Many C-language tools generate code for independent
functions; once a collection of functions is loaded and RAM is set up for
them, any of them could be executed directly without having previously run
any other code produced by the C-language tools. Responsibility for any
zero-initialization would lie with the external loading process.

TI does bundle some assembly-language files which can be adjusted to
accommodate a variety of loading scenarios (code runs from flash; code gets
loaded into RAM from an external storage device and then run, etc.), and
it would probably have been a good idea to have them default to zero-
initializing storage. I'm not sure whether such files should be viewed as
really being part of the "C implementation", however, since one could do C
programming without them if e.g. one had a boot loader that could process
ELF files directly and set up RAM according to that spec (including zero-
initialization).

Post by David Brown
Clearly it is possible to write code that can be run by the second
implementation. And clearly it is possible to write code that /cannot/
be run by the first implementation (just use enough constant data to
overflow the ram in the microcontroller, avoiding the
implementation-specific features needed to keep that constant data in
flash). But with the second implementation, standard programs that
should be able to run (i.e., they are within the limitations of ram,
rom, etc.), will compile cleanly but have different and incorrect
behaviour compared to other conforming C implementations. And that is
the vital difference.

I've not had problems with the const-pointer treatment breaking code; I'm
not sure how much otherwise-usable code would be broken by it. On many
parts, the difference in RAM and ROM sizes is sufficiently large (30:1 or
more) that a lot of programs would be nowhere close to being able to store
all their tables in RAM, but would still be nowhere near filling their code
space.

Post by David Brown
The problem is non-const data accessed through const qualified pointers.
extern int average(int noOfSamples, const int * pSamples);
int x[10];
int av = average(10, x);

On the PIC compilers, that will work without difficulty. The act of reading
a const pointer causes the compiler to check a bit in the pointer and either
access RAM or ROM. The situations the PIC has trouble with are using non-
const pointers to read const data, or round-tripping const pointers through
non-const pointers, but of which are much rarer than using const pointers to
read both const and non-const data.

I don't know what the Atmel compiler does, but if it can't use const pointers
to read non-const data I would agree that is an unjustifiable abuse of the
"const" keyword which doesn't accommodate any useful scenarios that would not
be served just as well using a different keyword. The reason I regard the
PIC's behavior as somewhat reasonable [albeit obviously non-conforming] is
that it allows many programs to run efficiently without modification in a
way which would not be possible had another keyword been chosen. If the
Atmel compiler's implementation doesn't offer that advantage, I would see
no justification for re-purposing the keyword.

Post by David Brown
A C compiler should not be made to work only with a particular
specialised variety of programs! A compiler sold as a "C Compiler"
should be made to correctly compile as many valid C programs as possible
given the limitations of the target.

Since only a particular specialized variety of programs could possibly
do anything useful with 36 bytes of RAM and 1Kword of code space, it would
be impossible to write a compiler for such a platform that *wasn't* limited
to a particular specialized variety of programs.

More generally, most embedded compilers are intended largely for use on
systems which don't have any defined concept of console or file I/O, and
could thus only process usefully the particular specialized variety of
programs that are designed to operate with whatever hardware is built into
the target system.

The advantage of using C on many such systems isn't that one can simply grab
arbitrary bits of code from here and there and run it, but rather that a
programmer who wants to turn on a light need only worry about:

1. Which I/O port is the light hooked up to

2. Which bits of which addresses need to be written to activate that
port

Rather than also having to worry about the sequence of instructions that
might be needed to perform such an access, any restrictions that might exist
regarding the placement of such code and any data it requires, and also the
question of what one must do to ensure that code gets put into whatever kind
of code section is required on the target. The placement of the light and
the sequence of accesses needed to activate it will vary depending upon the
hardware platform, but there's no reason the syntax for performing such
accesses should need to.

David Brown

2017-04-17 16:08:23 UTC

<snip>

Post by s***@casperkitty.com
I've not had problems with the const-pointer treatment breaking code; I'm
not sure how much otherwise-usable code would be broken by it. On many
parts, the difference in RAM and ROM sizes is sufficiently large (30:1 or
more) that a lot of programs would be nowhere close to being able to store
all their tables in RAM, but would still be nowhere near filling their code
space.

If you haven't used a compiler where turning a non-const pointer into a
const pointer breaks things, then of course you won't have had problems
with it. Most compilers are perfectly happy with such conversions.

Post by s***@casperkitty.com

Post by David Brown
The problem is non-const data accessed through const qualified pointers.
extern int average(int noOfSamples, const int * pSamples);
int x[10];
int av = average(10, x);

On the PIC compilers, that will work without difficulty. The act of reading
a const pointer causes the compiler to check a bit in the pointer and either
access RAM or ROM. The situations the PIC has trouble with are using non-
const pointers to read const data, or round-tripping const pointers through
non-const pointers, but of which are much rarer than using const pointers to
read both const and non-const data.

Those will be rarer situations, but may still mean legal code will fail.

Post by s***@casperkitty.com
I don't know what the Atmel compiler does,

Please stop referring to this compiler as "the Atmel compiler". It was
a compiler written by an independent development tool company for one of
Atmel's microcontroller families - it was /not/ a compiler from Atmel.

Post by s***@casperkitty.com
but if it can't use const pointers
to read non-const data I would agree that is an unjustifiable abuse of the
"const" keyword which doesn't accommodate any useful scenarios that would not
be served just as well using a different keyword.

Great, you agree with the point I made a good many posts at the start of
this sub-thread.

Post by s***@casperkitty.com
The reason I regard the
PIC's behavior as somewhat reasonable [albeit obviously non-conforming] is
that it allows many programs to run efficiently without modification in a
way which would not be possible had another keyword been chosen.

I agree that such behaviour is "somewhat reasonable". I'd much prefer a
command line switch, however.

Post by s***@casperkitty.com
If the
Atmel compiler's implementation doesn't offer that advantage, I would see
no justification for re-purposing the keyword.

The compiler I was talking about allowed /some/ programs to run more
efficiently without modification (specifically, those that had a lot of
constant data but never accessed non-const data through a const
pointer). But it would silently break other programs.

Post by s***@casperkitty.com

Post by David Brown
A C compiler should not be made to work only with a particular
specialised variety of programs! A compiler sold as a "C Compiler"
should be made to correctly compile as many valid C programs as possible
given the limitations of the target.

Since only a particular specialized variety of programs could possibly
do anything useful with 36 bytes of RAM and 1Kword of code space, it would
be impossible to write a compiler for such a platform that *wasn't* limited
to a particular specialized variety of programs.

That is why I added the qualification "as many as possible /given the
limitations of the target/".

Post by s***@casperkitty.com
More generally, most embedded compilers are intended largely for use on
systems which don't have any defined concept of console or file I/O, and
could thus only process usefully the particular specialized variety of
programs that are designed to operate with whatever hardware is built into
the target system.
The advantage of using C on many such systems isn't that one can simply grab
arbitrary bits of code from here and there and run it, but rather that a
1. Which I/O port is the light hooked up to
2. Which bits of which addresses need to be written to activate that
port

There are a good many other advantages in using C (compared to assembly,
which I think was implied by you). But I agree that one would not
expect to take arbitrary existing C code and find it useful on a small
microcontroller.

Post by s***@casperkitty.com
Rather than also having to worry about the sequence of instructions that
might be needed to perform such an access, any restrictions that might exist
regarding the placement of such code and any data it requires, and also the
question of what one must do to ensure that code gets put into whatever kind
of code section is required on the target. The placement of the light and
the sequence of accesses needed to activate it will vary depending upon the
hardware platform, but there's no reason the syntax for performing such
accesses should need to.

s***@casperkitty.com

2017-04-17 16:38:21 UTC

Post by David Brown

Post by s***@casperkitty.com
I've not had problems with the const-pointer treatment breaking code; I'm
not sure how much otherwise-usable code would be broken by it. On many
parts, the difference in RAM and ROM sizes is sufficiently large (30:1 or
more) that a lot of programs would be nowhere close to being able to store
all their tables in RAM, but would still be nowhere near filling their code
space.

If you haven't used a compiler where turning a non-const pointer into a
const pointer breaks things, then of course you won't have had problems
with it. Most compilers are perfectly happy with such conversions.

On the PIC compilers, turning a non-const pointer into a const pointer is
not a problem, it is the reverse which is problematic. I make no effort
to justify any C dialect which can't accommodate non-const to const
conversions.

Post by David Brown

Post by s***@casperkitty.com
On the PIC compilers, that will work without difficulty. The act of reading
a const pointer causes the compiler to check a bit in the pointer and either
access RAM or ROM. The situations the PIC has trouble with are using non-
const pointers to read const data, or round-tripping const pointers through
non-const pointers, but of which are much rarer than using const pointers to
read both const and non-const data.

Those will be rarer situations, but may still mean legal code will fail.

Of course, requiring that all data be kept in RAM would mean that legal code
whose tables won't fit in RAM will fail. No compiler for a system with 192
bytes of RAM is going to be able to handle all code that should be legal.
Various compiler designs which forego the ability to run some should-be-legal
program may pick up the ability to run some others.

Post by David Brown

Post by s***@casperkitty.com
The reason I regard the
PIC's behavior as somewhat reasonable [albeit obviously non-conforming] is
that it allows many programs to run efficiently without modification in a
way which would not be possible had another keyword been chosen.

I agree that such behaviour is "somewhat reasonable". I'd much prefer a
command line switch, however.

Okay, I think our main point of divergence was that we were talking about
two different compilers' behaviors; I was insisting that the behavior I was
talking about was "somewaht reasonable", and you apparently agree; you were
insisting that the behavior you were talking about was unreasonable, and I
agree. I had been unaware that any compilers behaved as you described, and
you had been unaware that any behave as I describe, so now we both know of
another pattern that some compilers use.

Keith Thompson

2017-04-13 01:13:54 UTC

Post by David Brown

Post by s***@casperkitty.com

Post by David Brown

Post by s***@casperkitty.com
I don't think the Standard allows compilers to regard as Undefined Behavior
the use of any entry-function signature other than those explicitly described.
If an implementation that didn't describe any behavior for "void main()"
would be allowed to treat it as UB, I see no reason that it shouldn't be
allowed to describe a useful behavior that differs from "int main()".

I can't figure out what you are trying to say here. We are not talking
about undefined behaviour here - we are talking about a non-conforming
implementation of the compiler.

If some compiler X doesn't document any particular behavior for "void main()"
but such code happens to behave like "int main()" but without variable
initialization, such behavior would be conforming since use of "void main()"
would be UB, making all possible actions conforming.
If some compiler Y were identical to X except that it *documented* that
"void main()" would work as it does in compiler X, then "void main()" would
not be UB on compiler Y, but I see no reason that compiler Y shouldn't be
conforming.

You are still making no sense. The implementation's way of handling
main() cannot be undefined behaviour - it is defined by the
implementation. It might not conform to the C standards, but it can
never be undefined behaviour.

The phrase "undefined behavior" is defined in C11 3.4.3 as:

behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard imposes
no requirements

In general, if a construct has undefined behavior and an implementation
chooses to define it, it still has "undefined behavior" by that
definition.

But the behavior of the program `void main(void){}` is not
(unconditionally) undefined. By 5.1.2.2.1, the main function may
be defined "... or in some other implementation-defined manner".
If the current implementation documents its support for `void
main(void(){/*...*/}`, then that's a permissible definition, and the
behavior is defined (but the termination status is unspecified).
If not, then the program violates a "shall" outside a constraint,
and its behavior is undefined.

Note that for an implementation that documents that it supports `void
main(void)`, all the requirements in the standard (aside from the
termination status) are in full force. Omitting zero-initialization of
static objects would still make the implementation non-conforming.

I suppose a implementation could (a) *not* document `void main(void)`
as "some other implementation-defined manner" under 5.1.2.2.1, but
(b) document `void main(void)` as an extension under 4p6, and then
do anything it likes. But that would be perverse.

An implementation can certainly support an extension that
doesn't zero-initialize static objects, as long as that extension
doesn't break any strictly conforming program. For example, a
compiler-specific #pragma would be OK. Triggering the behavior
based on the return type of main would IMHO be silly.

(Note that `main` is relevant only for hosted implementations.)

[...]

Post by David Brown
I agree that for some targets, it can be better to make some breakage to
C standards conformity. But that should be kept to the absolute
minimum, be well documented, not break working code silently, and
preferably require a command-line switch to activate it or give warnings
about the odd behaviour. It is reasonable, for example, for a C
implementation on a small 8-bit micro to make "double" limited to 32
bits. It is /not/ reasonable to change the meaning of a keyword like
"const" to mean something subtly different, or to skip the zeroing of
uninitialised data.

Wouldn't it be better to support 32-bit "float" and reject "double"?
It's non-conforming either way, but it would cleanly reject code that
uses "double" and quite reasonably expects it to be wider than 32
bits.

[...]

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

s***@casperkitty.com

2017-04-13 14:37:30 UTC

Post by Keith Thompson
Wouldn't it be better to support 32-bit "float" and reject "double"?
It's non-conforming either way, but it would cleanly reject code that
uses "double" and quite reasonably expects it to be wider than 32
bits.

The Standard specifies that arguments to variadic functions get converted
to type "double". The Standard provides a means by which implementations
can report the level of precision that their "double" types support, but
specifies a minimum which requires using longer than a 32-bit significand.

Keith Thompson

2017-04-13 15:43:18 UTC

Post by s***@casperkitty.com

Post by Keith Thompson
Wouldn't it be better to support 32-bit "float" and reject "double"?
It's non-conforming either way, but it would cleanly reject code that
uses "double" and quite reasonably expects it to be wider than 32
bits.

The Standard specifies that arguments to variadic functions get converted
to type "double".

That's a good point. On the other hand, it might not apply to the tiny
8-bit system being discussed. I don't know whether it supports printf,
or specifically printf for floating-point types.

Post by s***@casperkitty.com
The Standard provides a means by which implementations
can report the level of precision that their "double" types support, but
specifies a minimum which requires using longer than a 32-bit significand.

True, but we're talking about a non-conforming implementation. The
question is which violation is better: making double 32 bits, or not
supporting double at all. The assumption is that the target supports
32-bit floating-point, but nothing wider (and it's not worthwhile to
support wider floating-point in software).

You can make double 32 bits, or you can drop support for double, change
the promotion rules, and change the type of floating-point constants.
Perhaps making double 32 bits is less intrusive.

And if you make double 32 bits, what about long double?

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

s***@casperkitty.com

2017-04-13 18:24:47 UTC

Post by Keith Thompson
That's a good point. On the other hand, it might not apply to the tiny
8-bit system being discussed. I don't know whether it supports printf,
or specifically printf for floating-point types.

The Hi-Tech compilers for the PIC have a library option to include one of
three different versions of printf; the fanciest one supports floating-point
(using 32-bit "double") but the code space required for that printf function
by itself would exceed the storage capacity of the smaller chips.

Post by Keith Thompson

Post by s***@casperkitty.com
The Standard provides a means by which implementations
can report the level of precision that their "double" types support, but
specifies a minimum which requires using longer than a 32-bit significand.

True, but we're talking about a non-conforming implementation. The
question is which violation is better: making double 32 bits, or not
supporting double at all. The assumption is that the target supports
32-bit floating-point, but nothing wider (and it's not worthwhile to
support wider floating-point in software).

Some platforms have a 32-bit FPU but nothing wider; others have no FPU and
must do everything in software.

Post by Keith Thompson
You can make double 32 bits, or you can drop support for double, change
the promotion rules, and change the type of floating-point constants.
Perhaps making double 32 bits is less intrusive.

Having DBL_DIG be quantitatively smaller than the value given in the
Standard would make an implementation non-conforming (IMHO, that should have
been a QoI issue, given that code which needs any particular level of
precision could test for it) but having floating-point values passed to
variadic functions as a type incompatible with "double" would be
qualitatively wrong.

Personally, I think systems without floating-point units should use 40- or
48-bit computational types with a 31-bit or 32-bit significand, and think
it unfortunate that the Standard would not have allowed implementations to
use such a type as "double". On 16-bit or 32-bit platforms, such a type
could be processed just as cheaply as an IEEE single, and even on 8-bit
platforms the cost wouldn't increase too badly. Going beyond a 32-bit
significand represents a big increase in computational cost on any system
which does not have hardware support for a larger type, but for many
applications the benefits of going beyond 32 bits would be far smaller than
the benefits of going from 24 to 32.

Post by Keith Thompson
And if you make double 32 bits, what about long double?

Also 32 bits; if a project is going to include library code to handle wider
types, that code might as well be used for "double".

Keith Thompson

2017-04-13 18:49:36 UTC

[...]

Post by s***@casperkitty.com

Post by Keith Thompson
You can make double 32 bits, or you can drop support for double, change
the promotion rules, and change the type of floating-point constants.
Perhaps making double 32 bits is less intrusive.

Having DBL_DIG be quantitatively smaller than the value given in the
Standard would make an implementation non-conforming (IMHO, that should have
been a QoI issue, given that code which needs any particular level of
precision could test for it) but having floating-point values passed to
variadic functions as a type incompatible with "double" would be
qualitatively wrong.

Hmm. I'd say non-conformance is non-conformance. Once you start
down that road, the main consideration should probably be programmer
convenience. (There should be a fairly high barrier to breaking
conformance in the first place; I'd rather have a conforming
implementation unless there's a *very* good reason not to.)

Post by s***@casperkitty.com
Personally, I think systems without floating-point units should use 40- or
48-bit computational types with a 31-bit or 32-bit significand, and think
it unfortunate that the Standard would not have allowed implementations to
use such a type as "double".

Are you sure it doesn't? The standard doesn't require double to be 64
bits. It requires DBL_DECIMAL_DIG to be at least 10 (that's about 34
bits) with a decimal exponent range of at least +/-37 (about 7 bits
including the exponent sign). I don't think you could satisfy that in
40 bits, but I think 48 would be plenty. This is a back-of-the-envelope
estimate, and there could be several off-by-one errors; perhaps someone
else could check this.

[...]

Post by s***@casperkitty.com

Post by Keith Thompson
And if you make double 32 bits, what about long double?

Also 32 bits; if a project is going to include library code to handle
wider types, that code might as well be used for "double".

Type double is special, since it's the type of unsuffixed floating
constants and the promoted type for variadic functions. But long double
has no such special role, and any code that uses it is probably
depending on it to have extra range and/or precision. Rejecting any use
of "long double", it seems to me, would be a better way to warn the you
that the target system doesn't support what you're asking for.

On the other hand, I don't program for such systems, and the opinions of
those who do should count more than mine.

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Robert Wessel

2017-04-13 19:43:23 UTC

Post by Keith Thompson

Post by s***@casperkitty.com
Personally, I think systems without floating-point units should use 40- or
48-bit computational types with a 31-bit or 32-bit significand, and think
it unfortunate that the Standard would not have allowed implementations to
use such a type as "double".

Are you sure it doesn't? The standard doesn't require double to be 64
bits. It requires DBL_DECIMAL_DIG to be at least 10 (that's about 34
bits) with a decimal exponent range of at least +/-37 (about 7 bits
including the exponent sign). I don't think you could satisfy that in
40 bits, but I think 48 would be plenty. This is a back-of-the-envelope
estimate, and there could be several off-by-one errors; perhaps someone
else could check this.

He's suggesting that the mantissa be limited to ~32 bits, so as to
make it easy to implement on small systems. IOW, ones that can handle
32 bit *integers* with reasonable performance, could likely handle a
32 bit in a straight-forward manner, and without special hardware. I
believe the requirements of DBL_DECIMAL_DIG make that impossible.

s***@casperkitty.com

2017-04-13 20:26:16 UTC

Post by Robert Wessel
He's suggesting that the mantissa be limited to ~32 bits, so as to
make it easy to implement on small systems. IOW, ones that can handle
32 bit *integers* with reasonable performance, could likely handle a
32 bit in a straight-forward manner, and without special hardware. I
believe the requirements of DBL_DECIMAL_DIG make that impossible.

Bingo. A 16-bit or 32-bit processor can handle a 32-bit mantissa as cheaply
as a 24-bit one, and a 64-bit mantissa as cheaply as a 53-bit one. On an 8-
bit system, when writing in machine code, the cost of a 40-bit mantissa would
not be terribly much greater than that of 32 bits, but using 32 bits would
allow some floating-point library functions to be written in C, using the
"unsigned long" type.

Keith Thompson

2017-04-13 20:32:26 UTC

Post by Robert Wessel

Post by Keith Thompson

Post by s***@casperkitty.com
Personally, I think systems without floating-point units should use 40- or
48-bit computational types with a 31-bit or 32-bit significand, and think
it unfortunate that the Standard would not have allowed implementations to
use such a type as "double".

Are you sure it doesn't? The standard doesn't require double to be 64
bits. It requires DBL_DECIMAL_DIG to be at least 10 (that's about 34
bits) with a decimal exponent range of at least +/-37 (about 7 bits
including the exponent sign). I don't think you could satisfy that in
40 bits, but I think 48 would be plenty. This is a back-of-the-envelope
estimate, and there could be several off-by-one errors; perhaps someone
else could check this.

He's suggesting that the mantissa be limited to ~32 bits, so as to
make it easy to implement on small systems. IOW, ones that can handle
32 bit *integers* with reasonable performance, could likely handle a
32 bit in a straight-forward manner, and without special hardware. I
believe the requirements of DBL_DECIMAL_DIG make that impossible.

I believe you're right. A conforming 48-bit double is possible, but a
conforming 48-bit double with a 32-bit significand is not.

I can see the advantage of allowing more flexibility for floating-point
types, particularly for freestanding implementations. I doubt that such
flexibility would be particularly useful for hosted implementations.

On the other hand, implementations already have as much flexibility as
they need if they don't claim ISO C conformance. One approach is to
implement as much of ISO C as practical and *clearly* document any
exceptions. Features that might reasonably be omitted include
floating-point wider than 32 bits, *all* floating-point, and 64-bit
integers. The question (well, *a* question) is whether it's worthwhile
to have the standard explictly permit such omissions.

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

s***@casperkitty.com

2017-04-13 21:27:50 UTC

Post by Keith Thompson
On the other hand, implementations already have as much flexibility as
they need if they don't claim ISO C conformance. One approach is to
implement as much of ISO C as practical and *clearly* document any
exceptions. Features that might reasonably be omitted include
floating-point wider than 32 bits, *all* floating-point, and 64-bit
integers. The question (well, *a* question) is whether it's worthwhile
to have the standard explictly permit such omissions.

Indeed. Another feature that is sometimes omitted in the absence of
special compiler options is support for recursive function calls. While
that is in some ways a limitation, it allows the linker to reject programs
whose automatic variable usage would overflow the stack, rather than having
such programs crash at runtime.

Presently, the Standard does not specify anything useful about what level of
nested subroutine calls an implementation must support, nor about what may
happen if code goes beyond that. It also does not specify what level of
nesting may be used in a Strictly Conforming program. Consequently, the
question of whether an implementation can execute a given Strictly Conforming
Program without jumping the rails is a "quality-of-implementation" issue.

I would suggest that the question of which programs an implementation can
*usefully* process should be a QoI issue, but anything worthy of being called
a "standard" should define categories of implementations and programs such
that behavior would always be at least loosely defined (chosen in Unspecified
fashion from Implementation-Defined alternatives) provided Implementation-
Defined environmental requirements are satisfied. Having an implementation
be able to usefully run a wide range of programs is helpful, but a guarantee
that it will reject any programs it can't run usefully would be even better.

Robert Wessel

2017-04-13 22:39:12 UTC

Post by s***@casperkitty.com

Post by Keith Thompson
On the other hand, implementations already have as much flexibility as
they need if they don't claim ISO C conformance. One approach is to
implement as much of ISO C as practical and *clearly* document any
exceptions. Features that might reasonably be omitted include
floating-point wider than 32 bits, *all* floating-point, and 64-bit
integers. The question (well, *a* question) is whether it's worthwhile
to have the standard explictly permit such omissions.

Indeed. Another feature that is sometimes omitted in the absence of
special compiler options is support for recursive function calls. While
that is in some ways a limitation, it allows the linker to reject programs
whose automatic variable usage would overflow the stack, rather than having
such programs crash at runtime.
Presently, the Standard does not specify anything useful about what level of
nested subroutine calls an implementation must support, nor about what may
happen if code goes beyond that. It also does not specify what level of
nesting may be used in a Strictly Conforming program. Consequently, the
question of whether an implementation can execute a given Strictly Conforming
Program without jumping the rails is a "quality-of-implementation" issue.
I would suggest that the question of which programs an implementation can
*usefully* process should be a QoI issue, but anything worthy of being called
a "standard" should define categories of implementations and programs such
that behavior would always be at least loosely defined (chosen in Unspecified
fashion from Implementation-Defined alternatives) provided Implementation-
Defined environmental requirements are satisfied. Having an implementation
be able to usefully run a wide range of programs is helpful, but a guarantee
that it will reject any programs it can't run usefully would be even better.

The issue of whether or not recursion is supported is separate from
the one regarding available "stack" space.

Some small systems do not support stack-based allocation in a
convenient way, and compilers tend to generate all local variables
equivalent to C statics. Such an implementation would likely not
support recursion.

Only in the case of a program with no recursion, no function pointers
no VLAs, no alloca, complete source visibility, etc., can the compiler
generally make a compile time determination of stack usage (although
some simple cases involving those would be amenable to analysis).
Removing recursion from the list doesn't solve the problem. Some sort
of runtime mechanism is likely needed if applications are going to
deal with "stack" overflow.

*"stack" used above in the sense of a LIFO structure, not specifically
a hardware stack.

s***@casperkitty.com

2017-04-13 23:16:10 UTC

Post by Robert Wessel
Only in the case of a program with no recursion, no function pointers
no VLAs, no alloca, complete source visibility, etc., can the compiler
generally make a compile time determination of stack usage (although
some simple cases involving those would be amenable to analysis).
Removing recursion from the list doesn't solve the problem. Some sort
of runtime mechanism is likely needed if applications are going to
deal with "stack" overflow.

Compilers I've used on the PIC and 8051 both required that compilers supply
information about automatic variable usage and called functions to the linker,
which could then build a call graph. A call to a function pointer of a given
type was regarded as a potential call to every compatible function whose
address had been taken. I don't remember whether the 8051 call graph counted
any temporary variables the compiler pushed on the hardware stack, but then
again I don't remember whether it ever actually used any. On the PIC, if the
total depth of the main line and interrupt call stacks was below the hardware
stack limit, a stack overflow couldn't happen.

If compilers for other platforms included similar logic and built upon it
slightly, it would be possible for languages to allow recursion while
allowing static validation of stack usage if the language included a
"__stack_safe" intrinsic which would return 0 unless the it could statically
guarantee stack safety when it returned 1. This would require that the
object file format allow functions to say, e.g.

When all __stack_space calls return returns 0:
Calls foo with 32 bytes on the stack
Calls bar with 68 bytes on the stack
Puts 192 bytes on the stack when not calling anything

When __stack_space check #1 returns 1:
Calls foo with 32 bytes on the stack
Calls bar with 68 bytes on the stack
Calls boz with 96 bytes on the stack
Puts 192 bytes on the stack when not calling anything

When __stack_space check #2 returns 2:
Calls moo with 48 bytes on the stack
Calls boz with 96 bytes on the stack
Puts 192 bytes on the stack when not calling anything

A linker could then compute how much space would need to be available on
the stack to allow any function to be called *safely*, and also compute
for each __stack_space check how much stack space would need to be available
to allow it to safely return 1. Programs could safely use recursion without
risk of stack overflow if there were at least one __stack_space check on any
recursive path, and there would be no need for any implementation-specific
measures of stack capacity. Provided that the linker were informed of the
worst-case stack usage for things like interrupts, assembly-language
functions, etc. it should be possible and practical to statically verify
stack usage without having to artificially constrain the depth of recursion
that could be used for things like structure parsing.

Tim Rentsch

2017-04-17 00:53:22 UTC

Post by Robert Wessel

Post by Keith Thompson

Post by s***@casperkitty.com
Personally, I think systems without floating-point units should use 40- or
48-bit computational types with a 31-bit or 32-bit significand, and think
it unfortunate that the Standard would not have allowed implementations to
use such a type as "double".

Are you sure it doesn't? The standard doesn't require double to be 64
bits. It requires DBL_DECIMAL_DIG to be at least 10 (that's about 34
bits) with a decimal exponent range of at least +/-37 (about 7 bits
including the exponent sign). I don't think you could satisfy that in
40 bits, but I think 48 would be plenty. This is a back-of-the-envelope
estimate, and there could be several off-by-one errors; perhaps someone
else could check this.

He's suggesting that the mantissa be limited to ~32 bits, so as to
make it easy to implement on small systems. IOW, ones that can handle
32 bit *integers* with reasonable performance, could likely handle a
32 bit in a straight-forward manner, and without special hardware. I
believe the requirements of DBL_DECIMAL_DIG make that impossible.

Right. The minumum value for DBL_DECIMAL_DIG implies that just
over 33 bits are needed (and that's for a decimal base - the
number of bits needed is higher for any non-power-of-10 base).

s***@casperkitty.com

2017-04-13 20:19:52 UTC

Post by Keith Thompson
[...]

Post by s***@casperkitty.com

Post by Keith Thompson
You can make double 32 bits, or you can drop support for double, change
the promotion rules, and change the type of floating-point constants.
Perhaps making double 32 bits is less intrusive.

Having DBL_DIG be quantitatively smaller than the value given in the
Standard would make an implementation non-conforming (IMHO, that should have
been a QoI issue, given that code which needs any particular level of
precision could test for it) but having floating-point values passed to
variadic functions as a type incompatible with "double" would be
qualitatively wrong.

Hmm. I'd say non-conformance is non-conformance. Once you start
down that road, the main consideration should probably be programmer
convenience. (There should be a fairly high barrier to breaking
conformance in the first place; I'd rather have a conforming
implementation unless there's a *very* good reason not to.)

Floating-point libraries tend to be code hogs; code which supported a minimal
conforming "double" and was only twice as slow for addition would likely need
to be about twice as big as code for an IEEE single. Actually, some low-end
compilers have an option to pare floating-point types back to a 16-bit
significand (for double as well as float), and by my understanding such
options are used fairly often, given the savings in code size and execution
time.

Post by Keith Thompson

Post by s***@casperkitty.com
Personally, I think systems without floating-point units should use 40- or
48-bit computational types with a 31-bit or 32-bit significand, and think
it unfortunate that the Standard would not have allowed implementations to
use such a type as "double".

Are you sure it doesn't? The standard doesn't require double to be 64
bits. It requires DBL_DECIMAL_DIG to be at least 10 (that's about 34
bits) with a decimal exponent range of at least +/-37 (about 7 bits
including the exponent sign). I don't think you could satisfy that in
40 bits, but I think 48 would be plenty. This is a back-of-the-envelope
estimate, and there could be several off-by-one errors; perhaps someone
else could check this.

For performance, the key on 16-bit and larger systems is to keep the
significand to 32 bits or less, which is good for 7-9 digits. A 40-bit
significand would be adequate, and could easily fit in a 48-bit type,
but on 16-bit and 32-bit platforms it would cost just as much to process
as a 48-bit significand.

Post by Keith Thompson

Post by s***@casperkitty.com

Post by Keith Thompson
And if you make double 32 bits, what about long double?

Also 32 bits; if a project is going to include library code to handle
wider types, that code might as well be used for "double".

Type double is special, since it's the type of unsuffixed floating
constants and the promoted type for variadic functions. But long double
has no such special role, and any code that uses it is probably
depending on it to have extra range and/or precision. Rejecting any use
of "long double", it seems to me, would be a better way to warn the you
that the target system doesn't support what you're asking for.

It's possible the compiler rejects "long double". The behavior of floating-
point types was sufficiently-poorly specified that it's unclear what any
particular authors means when using that type. The purpose for which the
IEEE extended type was created was to serve as a temporary computation type
for use within expressions, but if a programmer wants to use manual common
sub-expression elimination it's necessary to have a type that can be used to
represent temporaries. For example, if programmer wants to manually remove
the redundant computation of (a+b) in

double a,b,c1,c2; // Assign values somehow
double x = a+b+c1;
double y = a+b+c2;

on a system which uses an extended-precision temporary type, the proper way
to do that on a system where FLT_EVAL_METHOD==2 would be:

double a,b,c1,c2; // Assign values somehow
long double temp=a+b;
double x = temp+c1;
double y = temp+c2;

Note that what the code really needs is not that "temp" have any particular
level of precision, but rather that it be capable of holding the temporary
result "a+b" with the same precision as it would have within a larger
expression. If the authors of the Standard had specified that all floating-
point arguments, including "long double" will be passed as "double" except
in cases where there exists a prototype or the argument is a cast expression,
the "long double" type would probably have been a lot more popular since
all of:

printf("%10.5f", expressionYieldingDouble);
printf("%10.5f", expressionYieldingLongDouble);
printf("%10.5Lf", (long double)expressionYieldingDouble);
printf("%10.5Lf", (long double)expressionYieldingLongDouble);

would behave sensibly (the second line might not produce optimal precision,
but in most cases would be adequate).

Tim Rentsch

2017-04-17 00:49:33 UTC

[..question about size needed for floating-point types..]

[...] The standard doesn't require double to be 64
bits. It requires DBL_DECIMAL_DIG to be at least 10 (that's about 34
bits) with a decimal exponent range of at least +/-37 (about 7 bits
including the exponent sign). I don't think you could satisfy that in
40 bits, but I think 48 would be plenty. This is a back-of-the-envelope
estimate, and there could be several off-by-one errors; perhaps someone
else could check this.

Right, 40 bits is not enough to satisfy the Standard's minimum
requirements for double. It is just barely enough if a sign bit
is not needed, but taking negative numbers into account pushes
the minimum up to 41 bits. (And that bound holds only for
representations that use a decimal base.)

David Brown

2017-04-17 16:14:37 UTC

Post by Keith Thompson

Post by s***@casperkitty.com

Post by Keith Thompson
Wouldn't it be better to support 32-bit "float" and reject "double"?
It's non-conforming either way, but it would cleanly reject code that
uses "double" and quite reasonably expects it to be wider than 32
bits.

The Standard specifies that arguments to variadic functions get converted
to type "double".

That's a good point. On the other hand, it might not apply to the tiny
8-bit system being discussed. I don't know whether it supports printf,
or specifically printf for floating-point types.

It is not uncommon in development tools for small microcontrollers to
have limited or missing support for printf of floating point types, or
to make such support optional (since it will make the printf function
significantly bigger, even if the feature is not used by the program).

Post by Keith Thompson

Post by s***@casperkitty.com
The Standard provides a means by which implementations
can report the level of precision that their "double" types support, but
specifies a minimum which requires using longer than a 32-bit significand.

True, but we're talking about a non-conforming implementation. The
question is which violation is better: making double 32 bits, or not
supporting double at all. The assumption is that the target supports
32-bit floating-point, but nothing wider (and it's not worthwhile to
support wider floating-point in software).

Yes, that's the issue.

Post by Keith Thompson
You can make double 32 bits, or you can drop support for double, change
the promotion rules, and change the type of floating-point constants.
Perhaps making double 32 bits is less intrusive.

That is the most common opinion.

Post by Keith Thompson
And if you make double 32 bits, what about long double?

Typically, it is non-existant. There are a good many embedded
development tools where the support for C99 is limited, and this is a
common limitation.

Keith Thompson

2017-04-17 18:54:35 UTC

[...]

Post by David Brown

Post by Keith Thompson
And if you make double 32 bits, what about long double?

Typically, it is non-existant. There are a good many embedded
development tools where the support for C99 is limited, and this is a
common limitation.

This isn't the first time I've seen the incorrect idea that long double
was added in C99. In fact it existed in C89/C90. (C99 did add <math.h>
functions for float and long double; C90's <math.h> only provided
functions for double.) The ANSI C Rationale mentions "long double" as a
new type.

Still, dropping long double for a *non-conforming* compiler for a small
embedded system is likely to be a good idea.

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

David Brown

2017-04-17 20:41:35 UTC

Post by Keith Thompson
[...]

Post by David Brown

Post by Keith Thompson
And if you make double 32 bits, what about long double?

Typically, it is non-existant. There are a good many embedded
development tools where the support for C99 is limited, and this is a
common limitation.

This isn't the first time I've seen the incorrect idea that long double
was added in C99. In fact it existed in C89/C90. (C99 did add <math.h>
functions for float and long double; C90's <math.h> only provided
functions for double.) The ANSI C Rationale mentions "long double" as a
new type.

Really? That is news to me - but then, I come here to learn!

The only compilers I have used that have supported long double have been
those with good C99 support (like gcc, Code Warrior, Green Hills, and a
few others) - while those with little or no C99 support invariably have
no long doubles either.

Post by Keith Thompson
Still, dropping long double for a *non-conforming* compiler for a small
embedded system is likely to be a good idea.

Indeed. For most tools for small microcontrollers, that sort of
non-conformity is to be expected. And no one (or very few people) are
going to miss long double support on their 8 bit microcontrollers.

Keith Thompson

2017-04-17 22:02:34 UTC

Post by David Brown

Post by Keith Thompson
[...]

Post by David Brown

Post by Keith Thompson
And if you make double 32 bits, what about long double?

Typically, it is non-existant. There are a good many embedded
development tools where the support for C99 is limited, and this is a
common limitation.

This isn't the first time I've seen the incorrect idea that long double
was added in C99. In fact it existed in C89/C90. (C99 did add <math.h>
functions for float and long double; C90's <math.h> only provided
functions for double.) The ANSI C Rationale mentions "long double" as a
new type.

Really? That is news to me - but then, I come here to learn!
The only compilers I have used that have supported long double have been
those with good C99 support (like gcc, Code Warrior, Green Hills, and a
few others) - while those with little or no C99 support invariably have
no long doubles either.

Type long double has exactly the same minimum requirements as double, so
any compiler that *correctly* supports double can support long double
without much extra effort. But for a non-conforming compiler that
provides a narrower representation for double, supporting long double
would not add much value.

Post by David Brown

Post by Keith Thompson
Still, dropping long double for a *non-conforming* compiler for a small
embedded system is likely to be a good idea.

Indeed. For most tools for small microcontrollers, that sort of
non-conformity is to be expected. And no one (or very few people) are
going to miss long double support on their 8 bit microcontrollers.

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

David Brown

2017-04-14 10:40:32 UTC

<snip>

Post by Keith Thompson
Note that for an implementation that documents that it supports `void
main(void)`, all the requirements in the standard (aside from the
termination status) are in full force. Omitting zero-initialization of
static objects would still make the implementation non-conforming.
I suppose a implementation could (a) *not* document `void main(void)`
as "some other implementation-defined manner" under 5.1.2.2.1, but
(b) document `void main(void)` as an extension under 4p6, and then
do anything it likes. But that would be perverse.
An implementation can certainly support an extension that
doesn't zero-initialize static objects, as long as that extension
doesn't break any strictly conforming program. For example, a
compiler-specific #pragma would be OK. Triggering the behavior
based on the return type of main would IMHO be silly.

A common way to have data that is not zero initialised is to specify
that it be placed in a specific data section (rather than the common
standard .bss). It is nice, clear, and doesn't break anything. That's
the way I like to see it done.

Post by Keith Thompson
(Note that `main` is relevant only for hosted implementations.)
[...]

Post by David Brown
I agree that for some targets, it can be better to make some breakage to
C standards conformity. But that should be kept to the absolute
minimum, be well documented, not break working code silently, and
preferably require a command-line switch to activate it or give warnings
about the odd behaviour. It is reasonable, for example, for a C
implementation on a small 8-bit micro to make "double" limited to 32
bits. It is /not/ reasonable to change the meaning of a keyword like
"const" to mean something subtly different, or to skip the zeroing of
uninitialised data.

Wouldn't it be better to support 32-bit "float" and reject "double"?
It's non-conforming either way, but it would cleanly reject code that
uses "double" and quite reasonably expects it to be wider than 32
bits.

That would certainly have its advantages. However, it is very common to
use doubles in code without really thinking about it:

float x, y, z;
z = 2.0 * x + y;

The 2.0 is a double - to keep everything in floats, it would need to be
2.0f. Making doubles the same as floats keeps the source code simple.

The ideal solution, I think can be the way gcc handles it for devices
like the ARM Cortex M4F - a 32-bit microcontroller with hardware support
for single precision floating point but not double precision. Without
options, it works as standard C - doubles are 64-bit. You can have a
switch for warnings if your code uses doubles, and you can use a switch
to make literals such as 2.0 act as single-precision. You have all the
options and controls you might want, but by default everything is standard.

s***@casperkitty.com

2017-04-14 19:10:13 UTC

Post by David Brown
That would certainly have its advantages. However, it is very common to
float x, y, z;
z = 2.0 * x + y;
The 2.0 is a double - to keep everything in floats, it would need to be
2.0f. Making doubles the same as floats keeps the source code simple.

If the rules for "long double" had said that they are passed the same as
"double" except when either a prototype exists that specifies "long double"
or the argument expression *is* an explicit cast to long double, then all
constants could have supplied long double values at the compiler's leisure,
perhaps with a language rule that would indicate that given expressions
like:

float f = someFloat * 0.1;
double d = someDouble * 0.1;

a compiler would be allowed to perform the multiplication using whichever
type was most convenient and a constant suitable for that type (if there
were such a rule, there should be a suffix to explicitly indicate double
literals, in case one wanted to force the use of extra precision).

Post by David Brown
The ideal solution, I think can be the way gcc handles it for devices
like the ARM Cortex M4F - a 32-bit microcontroller with hardware support
for single precision floating point but not double precision. Without
options, it works as standard C - doubles are 64-bit. You can have a
switch for warnings if your code uses doubles, and you can use a switch
to make literals such as 2.0 act as single-precision. You have all the
options and controls you might want, but by default everything is standard.

What about variadic functions?

Keith Thompson

2017-04-14 20:22:44 UTC

Post by s***@casperkitty.com

Post by David Brown
That would certainly have its advantages. However, it is very common to
float x, y, z;
z = 2.0 * x + y;
The 2.0 is a double - to keep everything in floats, it would need to be
2.0f. Making doubles the same as floats keeps the source code simple.

If the rules for "long double" had said that they are passed the same as
"double" except when either a prototype exists that specifies "long double"
or the argument expression *is* an explicit cast to long double, then all
constants could have supplied long double values at the compiler's leisure,
perhaps with a language rule that would indicate that given expressions
float f = someFloat * 0.1;
double d = someDouble * 0.1;
a compiler would be allowed to perform the multiplication using whichever
type was most convenient and a constant suitable for that type (if there
were such a rule, there should be a suffix to explicitly indicate double
literals, in case one wanted to force the use of extra precision).

So you propose that given
long double x;
these calls:
func(x)
func((long double)x)
should have different semantics.

I hate that idea. I'm not sure what you're trying to achieve, but I
suggest finding a different way to do it.

[...]

The idea, I think, is to provide a *non-conforming* C implementation for
a target that supports 32-bit floating-point but nothing wider (assuming
software floating-point is impractical or not worthwhile). Given the
central role played by type double, I'm thinking a good approach would
be to make float and double both 32 bits while keeping all the other
language rules in place. Type long double could either be dropped or
made 32 bits. I suggest dropping it, since code that uses long double
probably depends on extra precision and/or range.

--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

s***@casperkitty.com

2017-04-15 00:14:59 UTC

Post by Keith Thompson
So you propose that given
long double x;
func(x)
func((long double)x)
should have different semantics.

There would be no reason to invoke a single-argument function without a
prototype in scope. Having a rule that says that all floating-point
arguments are passed as the same type, period, would be the logical way
to handle things save for one problem: if existing code on a system
expects that floating-point values will be passed as 64 bits, but there
is a need to pass a more precise floating-point number to something like
printf, there needs to be some syntax for saying "Pass this particular
argument as an extended-precision type rather than converting it to a
64-bit double." Perhaps it would be better to use some syntax other than
using the existing cast syntax for that purpose, but the key point is that
in all floating-point arguments should be passed in compatible fashion
except when something special is done to explicitly request otherwise.

David Brown

2017-04-12 22:20:41 UTC

Post by s***@casperkitty.com

Post by David Brown

Post by s***@casperkitty.com
I don't think the Standard allows compilers to regard as Undefined Behavior
the use of any entry-function signature other than those explicitly described.
If an implementation that didn't describe any behavior for "void main()"
would be allowed to treat it as UB, I see no reason that it shouldn't be
allowed to describe a useful behavior that differs from "int main()".

I can't figure out what you are trying to say here. We are not talking
about undefined behaviour here - we are talking about a non-conforming
implementation of the compiler.

If some compiler X doesn't document any particular behavior for "void main()"
but such code happens to behave like "int main()" but without variable
initialization, such behavior would be conforming since use of "void main()"
would be UB, making all possible actions conforming.
If some compiler Y were identical to X except that it *documented* that
"void main()" would work as it does in compiler X, then "void main()" would
not be UB on compiler Y, but I see no reason that compiler Y shouldn't be
conforming.

Post by David Brown

Post by s***@casperkitty.com
The "const" pointers were treated as a "universal" pointer type that could
access things in either RAM or flash/ROM.

C does not make a distinction between RAM or ROM. /All/ pointers in C
to a given type are "universal" pointers. A const pointer is a pointer
which the /programmer/ promises he will not use to modify data. Data
that is defined as const is stronger - there the compiler /knows/ that
the data will never be modified, and is allowed to place it in read-only
memory.

I'm well aware of that. The question is whether it's better to have a
1. default to RAM-only pointers and require that all objects fit in
RAM unless they have a special qualifier that makes them only
accessible via specially-qualified pointers
2. default to universal pointers and require that any code that needs
to perform well with data in RAM use qualifiers to allow that, or
have the default behavior in the absence of vendor-specific
qualifiers depend upon whether things are "const" qualified. If I
were designing a compiler, I'd be inclined to allow command line
settings or #pragma directives select among such behaviors, since
there are situations where each could be most useful, but the
non-conforming option is probably the one which would allow the
largest amount of code to work efficiently without modification.

Post by David Brown
Of course, a compiler implementation can provide extensions beyond this
model - and perhaps limitations if necessary in order to keep an
efficient implementation. But a compiler implementation may not usurp
normal standard C keywords and usage for such different behaviour.

Compilers can do whatever they want. Such behavior may make them non-
conforming, but in some situations a non-conforming compiler might be more
useful than any fully-conforming compiler ever could be. For example, on
an 8-bit processor, a compiler that has a single 32-bit floating-point type
may be more useful than one which can't pass floating-point values to
variadic functions without bundling 48-bit-or-larger math code. If code
never needs anything beyond standard float precision, the extra machine code
for larger types would represent a waste of time and space.

Post by David Brown
In this particular case, the microcontroller in question was an AVR.
This is an 8-bit device, and it uses different cpu instructions to
access memory in RAM from memory in Flash. The old compiler (I omit the
name to protect the guilty) was trying to make it convenient for casual
users to have data such as tables or strings in flash. In contrast, the
standard compiler for this chip is gcc which uses normal semantics and
normal standard C. This means that if you write "const int x = 1234;"
and take the address of x, then x will be placed in RAM - even though it
never changes. This is necessary to make pointers work properly, as
normal C. The compiler provides extensions to allow you to put data
specifically in flash and access it directly (including a type of
universal pointer that can access anything).

That's certainly a reasonable approach, and a quality compiler should
probably offer it as an option, but for many purposes a non-conforming
option that works like other compilers do may be more useful.

Post by David Brown

Post by s***@casperkitty.com
A pointer type for things that
were known to be in flash would have improved efficiency in some cases,
though access through flash was generally slow enough that the extra time
to check whether a pointer was a RAM address wasn't a problem. The treatment
meant that a *const* was incompatible with a **, and meant that code which
needlessly applied "const" to pointers would run much more slowly than it
should, but maintained correct behavior in most cases.
Maintaining compatibility with 100% of code would require using two-byte
pointers for everything and generating a function call for every pointer
access. While requiring that code mark everyplace where a one-byte pointer
should be used might have worked, it would have required a lot of code
markup.

Post by David Brown

Post by s***@casperkitty.com

Post by David Brown
Integer promotion rules aimed at being "helpful" to the programmer,
rather than following the standards.

The Standard allows implementations some flexibility to be helpful with
regard to signed integers. Unsigned integer types have rigidly-defined
behavior.

No, it does not allow such flexibility. If you have a uint8_t or an
int8_t, and you perform arithmetic, it is to be promoted to "int" before
carrying out the operation. A compiler that does not do that
(logically, at least - it does not actually have to do the 16+ bit
operations if only parts of the results are used) is a broken compiler
that causes confusion and means code that should be portable and
testable on different targets, no longer works.

A compiler could just as well promote to an even longer type at its
convenience, except in contexts like "sizeof", since operations on the
longer type would behave identically to "int" in all cases where the
Standard would impose any requirements upon the latter....

You don't need to repeat your favourite bugbear yet again - it is not
relevant here, any more than it is most of the times you bring it up.

int64_t mul_and_add(int32_t x, int32_t y)
{
return x*y;
}
When generating code for a 32-bit processor that includes a 32x32->64
multiply instruction, the Standard would allow a compiler to, at its
leisure, either perform the multiply as 32x32->32 and sign-extend, or
else perform a 32x32->64 computation and return the result as-is? I
don't know of any compilers that would actually promise the latter,
but the Standard would certainly allow a compiler to do so. On a
processor which uses 32-bit "int" on a 64-bit processor, such a promise
would be rather cheap, though code would have to avoid expressions that
mixed uint32_t with int32_t or smaller types to avoid icky semantics.

MehdiAmini

2017-04-07 15:59:38 UTC

[...]

Post by Ben Bacarisse
I might even just write
#define my_err(...) \
do { \
char buff[256]; \
snprintf(buff, sizeof(buff), __VA_ARGS__); \
my_err_s(buff); \
} while (0)

[...]

Does the above macro write to stderr?

--
www.my-c-codes.com/

Farewell.

David Brown

2017-04-07 16:42:41 UTC

Post by MehdiAmini
[...]

Post by Ben Bacarisse
I might even just write
#define my_err(...) \
do { \
char buff[256]; \
snprintf(buff, sizeof(buff), __VA_ARGS__); \
my_err_s(buff); \
} while (0)

[...]
Does the above macro write to stderr?

The macro does not write anywhere - it calls "my_err_s". The function
definition I gave for "my_err_s" earlier writes to stderr.

If you are new to C, and we are going too fast for you, then please let
us know. There are plenty of people in this group that are happy to
help beginners, but unless you say then we must assume you are
reasonably familiar with the language. But if you /were/ familiar with
the language, you would know the answer to your own question here!

MehdiAmini

2017-04-08 03:34:32 UTC

[...]

Post by David Brown
If you are new to C, and we are going too fast for you, then please let
us know. There are plenty of people in this group that are happy to
help beginners, but unless you say then we must assume you are
reasonably familiar with the language. But if you /were/ familiar with
the language, you would know the answer to your own question here!

[...]

Sorry, I am _not_ new to C. I just made a wrong assumption that this
my_err_s is somehow like printf_s and scanf_s.

--
www.my-c-codes.com/

Farewell.

Richard Heathfield

2017-04-08 06:51:26 UTC

Post by MehdiAmini
[...]

Post by David Brown
If you are new to C, and we are going too fast for you, then please let
us know. There are plenty of people in this group that are happy to
help beginners, but unless you say then we must assume you are
reasonably familiar with the language. But if you /were/ familiar with
the language, you would know the answer to your own question here!

[...]
Sorry, I am _not_ new to C. I just made a wrong assumption that this
my_err_s is somehow like printf_s and scanf_s.

Why not ask Google instead of wasting your time asking experts? Their
answers are very unlikely to show up on a Google search, so it's obvious
they have nothing of value to tell you.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

MehdiAmini

2017-04-08 07:41:35 UTC

Post by Richard Heathfield

Post by MehdiAmini
[...]

Post by David Brown
If you are new to C, and we are going too fast for you, then please let
us know. There are plenty of people in this group that are happy to
help beginners, but unless you say then we must assume you are
reasonably familiar with the language. But if you /were/ familiar with
the language, you would know the answer to your own question here!

[...]
Sorry, I am _not_ new to C. I just made a wrong assumption that this
my_err_s is somehow like printf_s and scanf_s.

Why not ask Google instead of wasting your time asking experts? Their
answers are very unlikely to show up on a Google search, so it's obvious
they have nothing of value to tell you.

Sorry, you misunderstood me on the "Reliable C tutorial for beginners"
thread. When I said that your suggested tutorial is not shown up in
several search engines first page, I meant that beginners who search for
a tutorial on C, would read a tutorial that may misinform them since
they may not be reliable tutorials. By the way I actually read your
suggested tutorial. Even though I am not a beginner in C.

--
www.my-c-codes.com/

Farewell.

Richard Heathfield

2017-04-08 10:38:11 UTC

On 08/04/17 08:41, MehdiAmini wrote:
<snip>

Post by MehdiAmini
Sorry, you misunderstood me on the "Reliable C tutorial for beginners"
thread. When I said that your suggested tutorial is not shown up in
several search engines first page, I meant that beginners who search for
a tutorial on C, would read a tutorial that may misinform them since
they may not be reliable tutorials.

That comment might reasonably have been addressed to the purveyors of
the Internet search engines that you used, but there's no earthly point
in raising it here.

Post by MehdiAmini
By the way I actually read your
suggested tutorial. Even though I am not a beginner in C.

Even C experts are likely to learn something from Tom's tutorial.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

Ben Bacarisse

2017-04-08 17:52:05 UTC

Post by Ben Bacarisse
<snip>

Post by MehdiAmini
Sorry, you misunderstood me on the "Reliable C tutorial for beginners"
thread. When I said that your suggested tutorial is not shown up in
several search engines first page, I meant that beginners who search for
a tutorial on C, would read a tutorial that may misinform them since
they may not be reliable tutorials.

That comment might reasonably have been addressed to the purveyors of
the Internet search engines that you used, but there's no earthly
point in raising it here.

Surely it's helpful for others to know that? Someone might come across
this thread when searching for C tutorials but might not see the one
that contains the recommended link. It's not exactly a must post remark
but it seemed to be intended in a helpful way and I thought, on
refection, that it might indeed be helpful.

<snip>

--
Ben.

David Brown

2017-04-08 11:48:59 UTC

Post by MehdiAmini
[...]

Post by David Brown
If you are new to C, and we are going too fast for you, then please let
us know. There are plenty of people in this group that are happy to
help beginners, but unless you say then we must assume you are
reasonably familiar with the language. But if you /were/ familiar with
the language, you would know the answer to your own question here!

[...]
Sorry, I am _not_ new to C. I just made a wrong assumption that this
my_err_s is somehow like printf_s and scanf_s.

That's an odd assumption. /Your/ function was called "my_err", so when
changing it to a macro I needed an extra function name and used
"my_err_s". And I gave an implementation of it my post.

But people make mistakes or misread posts sometimes - no harm done. Go
back and read my first answer in this thread, and see if it will do the
job you need. (But use Ben's improvements to the macro.)

MehdiAmini

2017-04-08 16:04:34 UTC

[...]

Post by David Brown
But people make mistakes or misread posts sometimes - no harm done. Go
back and read my first answer in this thread, and see if it will do the
job you need. (But use Ben's improvements to the macro.)

I am using this method, and it seems to work correctly.

--
www.my-c-codes.com/

Farewell.

Steven Petruzzellis

2017-04-11 07:06:47 UTC

This post might be inappropriate. Click to display it.

55 Replies
46 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

MehdiAmini 2017-04-07 07:50:25 UTC

Barry Schwarz 2017-04-07 08:28:42 UTC

David Brown 2017-04-07 09:58:21 UTC

Ben Bacarisse 2017-04-07 10:43:53 UTC

David Brown 2017-04-07 11:35:31 UTC

Ben Bacarisse 2017-04-07 12:50:25 UTC

David Brown 2017-04-07 13:54:18 UTC

Ben Bacarisse 2017-04-07 18:35:38 UTC

David Brown 2017-04-08 11:41:58 UTC

s***@casperkitty.com 2017-04-10 20:52:25 UTC

David Brown 2017-04-10 22:18:25 UTC

s***@casperkitty.com 2017-04-10 22:43:30 UTC

Keith Thompson 2017-04-10 23:10:12 UTC

s***@casperkitty.com 2017-04-10 23:26:54 UTC

David Brown 2017-04-11 09:33:10 UTC

s***@casperkitty.com 2017-04-11 15:37:56 UTC

David Brown 2017-04-12 22:20:30 UTC

s***@casperkitty.com 2017-04-12 22:55:50 UTC

Keith Thompson 2017-04-13 01:17:03 UTC

David Brown 2017-04-14 10:27:43 UTC

s***@casperkitty.com 2017-04-14 18:57:18 UTC

David Brown 2017-04-17 16:08:23 UTC

s***@casperkitty.com 2017-04-17 16:38:21 UTC

Keith Thompson 2017-04-13 01:13:54 UTC

s***@casperkitty.com 2017-04-13 14:37:30 UTC

Keith Thompson 2017-04-13 15:43:18 UTC

s***@casperkitty.com 2017-04-13 18:24:47 UTC

Keith Thompson 2017-04-13 18:49:36 UTC

Robert Wessel 2017-04-13 19:43:23 UTC

s***@casperkitty.com 2017-04-13 20:26:16 UTC

Keith Thompson 2017-04-13 20:32:26 UTC

s***@casperkitty.com 2017-04-13 21:27:50 UTC

Robert Wessel 2017-04-13 22:39:12 UTC

s***@casperkitty.com 2017-04-13 23:16:10 UTC

Tim Rentsch 2017-04-17 00:53:22 UTC

s***@casperkitty.com 2017-04-13 20:19:52 UTC

Tim Rentsch 2017-04-17 00:49:33 UTC

David Brown 2017-04-17 16:14:37 UTC

Keith Thompson 2017-04-17 18:54:35 UTC

David Brown 2017-04-17 20:41:35 UTC

Keith Thompson 2017-04-17 22:02:34 UTC

David Brown 2017-04-14 10:40:32 UTC

s***@casperkitty.com 2017-04-14 19:10:13 UTC

Keith Thompson 2017-04-14 20:22:44 UTC

s***@casperkitty.com 2017-04-15 00:14:59 UTC

David Brown 2017-04-12 22:20:41 UTC

MehdiAmini 2017-04-07 15:59:38 UTC

David Brown 2017-04-07 16:42:41 UTC

MehdiAmini 2017-04-08 03:34:32 UTC

Richard Heathfield 2017-04-08 06:51:26 UTC

MehdiAmini 2017-04-08 07:41:35 UTC

Richard Heathfield 2017-04-08 10:38:11 UTC

Ben Bacarisse 2017-04-08 17:52:05 UTC

David Brown 2017-04-08 11:48:59 UTC

MehdiAmini 2017-04-08 16:04:34 UTC

Steven Petruzzellis 2017-04-11 07:06:47 UTC

about - legalese

Loading...