The difference between strtol() and strtoul() ?

Discussion:

The difference between strtol() and strtoul() ?

(too old to reply)

Kenny McCormack

2024-06-20 14:06:45 UTC

Interestingly, I note that strtoul() accepts strings that begin with a sign
(+ or -). This is odd, since you'd (*) think that a sign (particularly, a
minus) would be a syntax error in parsing for an unsigned value.

Further, although the (Linux) man page is more than a bit murky on the
subject, it seems that the result of parsing, say, "-1", with strtoul() is
the largest unsigned value (usually, 2**N-1 or a lot of F's (in hex)).
Whereas, I would expect it to be 1 (i.e., just take the absolute value).

Comments? I find this all very counterintuitive.

(*) Or should I say, "one would" ?

P.S. Why isn't there a strtoi() or strtou() ? I know, of course, that
there is atoi(), but that doesn't have the error checking capability that
the strto* functions have.

--
If you think you have any objections to anything I've said above, please
navigate to this URL:
http://www.xmission.com/~gazelle/Truth
This should clear up any misconceptions you may have.

Scott Lurndal

2024-06-20 14:46:52 UTC

Post by Kenny McCormack
Interestingly, I note that strtoul() accepts strings that begin with a sign
(+ or -). This is odd, since you'd (*) think that a sign (particularly, a
minus) would be a syntax error in parsing for an unsigned value.

The strtoul/strtoull function semantics match the C language semantics.

$ cat /tmp/a.c
#include <stdio.h>
int main(int argc, const char **argv)
{
unsigned long v = -1ul;

printf("0x%lx\n", v);
return 0;
}
$ cc -Wall -Werror -o /tmp/a /tmp/a.c
$ /tmp/a
0xffffffffffffffff
$

Keith Thompson

2024-06-20 21:37:29 UTC

***@slp53.sl.home (Scott Lurndal) writes:
[snip]

Post by Scott Lurndal
The strtoul/strtoull function semantics match the C language semantics.
$ cat /tmp/a.c
#include <stdio.h>
int main(int argc, const char **argv)
{
unsigned long v = -1ul;
printf("0x%lx\n", v);
return 0;
}
$ cc -Wall -Werror -o /tmp/a /tmp/a.c
$ /tmp/a
0xffffffffffffffff
$

The functions accept a syntax that doesn't exactly match anything
in C's grammar.

Both accept integer constants and a restricted subset of other
integer constant expressions. "1", "+1", and "-1" are accepted.
"1+1" is not (nor is "- 1").

For signed integers, that's perfectly reasonable; +1 and -1 are
expressions, not constants, but users are not likely to care about
the distinction. These functions deal with user input, not C syntax.

For unsigned integers, it would have made sense to disallow signs,
or at least disallow leading '-'. The behavior of unary "-" for
unsigned integers is well defined, but probably not something that
users should need to be aware of when providing program input.

My guess is that the authors of strtol and strtoul thought
consistency between the two functions was important, and I'm not
sure I disagree -- but interpreting "-1" as 18446744073709551615
can certainly be counterintuitive.

The ANSI C Rationale indicates that the strto*() functions were
adopted from UNIX System V.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Lew Pitcher

2024-06-20 14:48:53 UTC

Post by Kenny McCormack
Interestingly, I note that strtoul() accepts strings that begin with a sign
(+ or -). This is odd, since you'd (*) think that a sign (particularly, a
minus) would be a syntax error in parsing for an unsigned value.

IIUC, the ISO C standard does not make a distinction between strings that
make sense for an unsigned long vs strings that make sense for a signed long.
The standard says (with regards to the strtol, strtoll, strtoul, and strtoull
functions):
"... the expected form of the subject sequence is a sequence of letters
and digits representing an integer with the radix specified by base,
optionally preceded by a plus or minus sign ... . If the value of base
is 16, the characters 0x or 0X may optionally precede the sequence of
letters and digits, following the sign if present."
so, it appears that the ISO C standard permits the input string to specify
a sign, even if the resulting conversion does not.

Post by Kenny McCormack
Further, although the (Linux) man page is more than a bit murky on the
subject, it seems that the result of parsing, say, "-1", with strtoul() is
the largest unsigned value (usually, 2**N-1 or a lot of F's (in hex)).
Whereas, I would expect it to be 1 (i.e., just take the absolute value).

Why would you expect that? Again, the ISO standard says:
"If the subject sequence has the expected form ... it is used as the base
for conversion, ascribing to each letter its value ... . If the subject
sequence begins with a minus sign, the value resulting from the conversion
is negated (in the return type)."
and
"If the correct value is outside the range of representable values, LONG_MIN,
LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned
(according to the return type and sign of the value, if any) ... ."

Post by Kenny McCormack
Comments? I find this all very counterintuitive.

I can't comment on /your/ internalization of the standards and expected
behaviour. But, the standard makes sense (in an eccentric sort of way)
to me, in that the defining distinction of the various strto*l() functions
is not the format of the input, but the format of the output of the function.

Post by Kenny McCormack
(*) Or should I say, "one would" ?
P.S. Why isn't there a strtoi() or strtou() ? I know, of course, that
there is atoi(), but that doesn't have the error checking capability that
the strto* functions have.

--
Lew Pitcher
"In Skills We Trust"

Lew Pitcher

2024-06-20 15:26:51 UTC

On Thu, 20 Jun 2024 14:06:45 +0000, Kenny McCormack wrote:

[snip]

Post by Kenny McCormack
P.S. Why isn't there a strtoi() or strtou() ? I know, of course, that
there is atoi(), but that doesn't have the error checking capability that
the strto* functions have.

I don't know, but I'd /guess/ that, because the strto*l() functions return
a value that can easily be range-checked and (possibly) truncated to fit in
an int, the ISO committee didn't see any reason add another set of specialized
functions.

--
Lew Pitcher
"In Skills We Trust"

Kaz Kylheku

2024-06-20 22:55:01 UTC

Post by Kenny McCormack
Interestingly, I note that strtoul() accepts strings that begin with a sign
(+ or -). This is odd, since you'd (*) think that a sign (particularly, a
minus) would be a syntax error in parsing for an unsigned value.

unsigned int x = -42; // implementation defined result: UINT_MAX - 41

These functions seem to be geared toward the C language (perhaps writing
compilers or tooling for C). Note that these functions recognize
a leading zero for octal when base is specified as zero, and also
recognize the 0x prefix when base is 0 or 16.

So it is unsurprising that the unsigned functions would accept
negative values and do the modulo reduction.

Post by Kenny McCormack
Further, although the (Linux) man page is more than a bit murky on the
subject, it seems that the result of parsing, say, "-1", with strtoul() is
the largest unsigned value (usually, 2**N-1 or a lot of F's (in hex)).
Whereas, I would expect it to be 1 (i.e., just take the absolute value).
Comments? I find this all very counterintuitive.
(*) Or should I say, "one would" ?
P.S. Why isn't there a strtoi() or strtou() ? I know, of course, that
there is atoi(), but that doesn't have the error checking capability that
the strto* functions have.

I suspect, because, at the time strtol was introduced, long was the
widest integer type.

When designing an integer parsing function, why would you not just
have one function, working with the widest type?

Unfortunately, though, strtoll later had to be added.

If strtol didn't exist today, making it necessary to invent it or
something like it, that function should use the intmax_t type.
Then there wouldn't be any need to add new variants going forward.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Kenny McCormack

2024-06-20 23:35:37 UTC

In article <***@kylheku.com>,
Kaz Kylheku <643-408-***@kylheku.com> wrote:
...

Post by Kaz Kylheku
If strtol didn't exist today, making it necessary to invent it or
something like it, that function should use the intmax_t type.
Then there wouldn't be any need to add new variants going forward.

There actually is.

STRTOIMAX(3) Linux Programmer's Manual STRTOIMAX(3)

NAME
strtoimax, strtoumax - convert string to integer

SYNOPSIS
#include <inttypes.h>

intmax_t strtoimax(const char *nptr, char **endptr, int base);
uintmax_t strtoumax(const char *nptr, char **endptr, int base);

DESCRIPTION
These functions are just like strtol(3) and strtoul(3), except that
they return a value of type intmax_t and uintmax_t, respectively.

--
Conservatives want smaller government for the same reason criminals want fewer cops.

Kenny McCormack

2024-06-21 13:58:01 UTC

Post by Kenny McCormack
Interestingly, I note that strtoul() accepts strings that begin with a sign
(+ or -). This is odd, since you'd (*) think that a sign (particularly, a
minus) would be a syntax error in parsing for an unsigned value.

There have been some useful responses on this thread, which is Good. Of
course, there have also been the usual crappola-type responses, but one must
learn to take the good with the bad.

Anyway, I think the takeaway is that while it is what it is, an argument
can certainly be made that it would have been better for the unsigned
versions of these function to not accept signed input. If I were designing
it, I would have had strtoul("-1") be a syntax error (not a C language
syntax error - but a meta-language syntax error) - or, if not that, then
have it return 1, not 2**N-1. But that's just me.

I appreciate the responses indicating that it was probably done the way it
was for actually both of these reasons:
1) Because it makes it more useful for C compiler writers - who were
seen as the primary audience.
2) Because it means that the two functions are literally the same code.
Both calculate the same bit pattern - the difference is only in the
caller's interpretation of the result.

Post by Kenny McCormack
P.S. Why isn't there a strtoi() or strtou() ? I know, of course, that
there is atoi(), but that doesn't have the error checking capability that
the strto* functions have.

Yeah, now I get it. You really only need strtoimax() and strtoumax().

A result of any smaller type can be obtained by calling one of these
functions and storing the result in an object of the smaller type.

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/GodDelusion

Michael S

2024-06-21 15:28:39 UTC

On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and strtoumax().

Which are? uunfortunately, not part of C standard.

Post by Kenny McCormack
A result of any smaller type can be obtained by calling one of these
functions and storing the result in an object of the smaller type.

Or check for range and handle out of range values as appropriate by
situation.

Michael S

2024-06-21 15:53:14 UTC

On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and
strtoumax().

Which are? uunfortunately, not part of C standard.

Post by Kenny McCormack
A result of any smaller type can be obtained by calling one of these
functions and storing the result in an object of the smaller type.

Or check for range and handle out of range values as appropriate by
situation.

BTW, I don't know what The Standard says about out-of-range inputs, but
at least https://en.cppreference.com/w/c/string/byte/strtol does not
say anything certain. especially about what stored in *str_end.

Scott Lurndal

2024-06-21 16:14:58 UTC

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and
strtoumax().

Which are? uunfortunately, not part of C standard.

Post by Kenny McCormack
A result of any smaller type can be obtained by calling one of these
functions and storing the result in an object of the smaller type.

Or check for range and handle out of range values as appropriate by
situation.

BTW, I don't know what The Standard says about out-of-range inputs, but
at least https://en.cppreference.com/w/c/string/byte/strtol does not
say anything certain. especially about what stored in *str_end.

SuS defines ERANGE as the errno returned if the converted value is out of range.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/strtoull.html

Scott Lurndal

2024-06-21 16:54:33 UTC

Post by Scott Lurndal

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and
strtoumax().

Which are? uunfortunately, not part of C standard.

Post by Kenny McCormack
A result of any smaller type can be obtained by calling one of these
functions and storing the result in an object of the smaller type.

Or check for range and handle out of range values as appropriate by
situation.

BTW, I don't know what The Standard says about out-of-range inputs, but
at least https://en.cppreference.com/w/c/string/byte/strtol does not
say anything certain. especially about what stored in *str_end.

SuS defines ERANGE as the errno returned if the converted value is out of range.
https://pubs.opengroup.org/onlinepubs/9699919799/functions/strtoull.html

It should be quite clear what is stored at endptr in all cases from the
POSIX description.

Lawrence D'Oliveiro

2024-06-22 06:44:38 UTC

Post by Scott Lurndal
It should be quite clear what is stored at endptr in all cases from the
POSIX description.

You really need to be checking the C spec, just in case.

Scott Lurndal

2024-06-22 15:16:24 UTC

Post by Lawrence D'Oliveiro

Post by Scott Lurndal
It should be quite clear what is stored at endptr in all cases from the
POSIX description.

You really need to be checking the C spec, just in case.

No, I don't. The posix document clearly states that the text
is from ISO C (and clearly marks any extensions).

You really need to control the need to reply to every post.

Lawrence D'Oliveiro

2024-06-22 23:21:43 UTC

Post by Scott Lurndal

Post by Lawrence D'Oliveiro

Post by Scott Lurndal
It should be quite clear what is stored at endptr in all cases from the
POSIX description.

You really need to be checking the C spec, just in case.

No, I don't.

It is the authoritative reference.

James Kuyper

2024-06-23 00:10:32 UTC

...

Post by Scott Lurndal

Post by Lawrence D'Oliveiro
You really need to be checking the C spec, just in case.

No, I don't. The posix document clearly states that the text
is from ISO C (and clearly marks any extensions).

It also clearly states:
"The functionality described on this reference page is aligned with the
ISO C standard. Any conflict between the requirements described here and
the ISO C standard is unintentional. This volume of POSIX.1-2017 defers
to the ISO C standard."

This tells you two important things: they believe that there's a small
but significant chance of their description being unintentionally in
conflict with the C standard. And, if that is the case, POSIX defers to C.
You're better off reading the original than the thing that is supposed
to be a faithful copy, but might not be.

Ben Bacarisse

2024-06-21 17:15:07 UTC

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and
strtoumax().

Which are? uunfortunately, not part of C standard.

Post by Kenny McCormack
A result of any smaller type can be obtained by calling one of these
functions and storing the result in an object of the smaller type.

Or check for range and handle out of range values as appropriate by
situation.

BTW, I don't know what The Standard says about out-of-range inputs, but
at least https://en.cppreference.com/w/c/string/byte/strtol does not
say anything certain. especially about what stored in *str_end.

It says what value should be returned. That's something certain!

As for what gets put into *str_end that page could be clearer. The
standard says that a pointer just past the last of the digits is stored,
provided the input has the right form (spaces, sign, prefix, digits).
The cppreference page says a pointer just past "the last numeric
character interpreted" which begs the question of what "interpreted"
means when the result is possibly out of range. Maybe saying "scanned"
rather than interpreted would be better. The end pointer always points
just past any syntactically valid characters, even when the result is
out of range.

--
Ben.

Michael S

2024-06-23 09:19:52 UTC

On Fri, 21 Jun 2024 18:15:07 +0100

Post by Ben Bacarisse

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and
strtoumax().

Which are? uunfortunately, not part of C standard.

Post by Kenny McCormack
A result of any smaller type can be obtained by calling one of
these functions and storing the result in an object of the
smaller type.

Or check for range and handle out of range values as appropriate by
situation.

BTW, I don't know what The Standard says about out-of-range inputs,
but at least https://en.cppreference.com/w/c/string/byte/strtol
does not say anything certain. especially about what stored in
*str_end.

It says what value should be returned. That's something certain!

In case of strtol, yes.
In case of strtoul it also says what value should be returned, but
plain reading of cppreference.com text (at least *my* plain reading)
does not match observed behaviour. The text on cppreference.com
resembles Standard text, but does not match it.
Also, at least to me, Standard text itself appear very far from clear
and way too open to interpretations.
My own interpretation would be that for any negative input strtoul()
should return ULONG_MAX and set errno to ERANGE. None of the actual
implementation that I tested behaves in this manner.
It seems, the problem is of what is considered "range of representable
values" for unsigned type is by itself open to interpretations.

IMHO, even if in some part of the standard there exists text that
clearly states that "range of representable values for unsigned long =
[-ULONG_MAX:ULONG_MAX]" it is worth repeating that in the section that
defines strtol, because it is at all non-intuitive.

Ben Bacarisse

2024-06-23 11:38:51 UTC

Post by Michael S
On Fri, 21 Jun 2024 18:15:07 +0100

Post by Ben Bacarisse

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and strtoumax().

Which are? uunfortunately, not part of C standard.

Post by Kenny McCormack
A result of any smaller type can be obtained by calling one of
these functions and storing the result in an object of the
smaller type.

Or check for range and handle out of range values as appropriate by
situation.

BTW, I don't know what The Standard says about out-of-range inputs,
but at least https://en.cppreference.com/w/c/string/byte/strtol
does not say anything certain. especially about what stored in
*str_end.

It says what value should be returned. That's something certain!

In case of strtol, yes.
In case of strtoul it also says what value should be returned, but
plain reading of cppreference.com text (at least *my* plain reading)
does not match observed behaviour. The text on cppreference.com
resembles Standard text, but does not match it.

Ah. What's the discrepancy you see?

Post by Michael S
Also, at least to me, Standard text itself appear very far from clear
and way too open to interpretations.
My own interpretation would be that for any negative input strtoul()
should return ULONG_MAX and set errno to ERANGE. None of the actual
implementation that I tested behaves in this manner.

I don't get that from the text. There is, after all, no "negative
input". There is a "subject sequence" which, if it starts with a minus
sign, causes the "value resulting from the conversion is negated (in the
return type)" which seems clear enough.

Post by Michael S
It seems, the problem is of what is considered "range of representable
values" for unsigned type is by itself open to interpretations.
IMHO, even if in some part of the standard there exists text that
clearly states that "range of representable values for unsigned long =
[-ULONG_MAX:ULONG_MAX]" it is worth repeating that in the section that
defines strtol, because it is at all non-intuitive.

I don't get what you are saying here. The range of values is [0:ULONG_MAX].

--
Ben.

Michael S

2024-06-23 12:32:19 UTC

On Sun, 23 Jun 2024 12:38:51 +0100

Post by Ben Bacarisse

Post by Michael S
On Fri, 21 Jun 2024 18:15:07 +0100

Post by Ben Bacarisse

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and strtoumax().

Which are? uunfortunately, not part of C standard.

Post by Kenny McCormack
A result of any smaller type can be obtained by calling one of
these functions and storing the result in an object of the
smaller type.

Or check for range and handle out of range values as
appropriate by situation.

BTW, I don't know what The Standard says about out-of-range
inputs, but at least
https://en.cppreference.com/w/c/string/byte/strtol does not say
anything certain. especially about what stored in *str_end.

It says what value should be returned. That's something certain!

In case of strtol, yes.
In case of strtoul it also says what value should be returned, but
plain reading of cppreference.com text (at least *my* plain reading)
does not match observed behaviour. The text on cppreference.com
resembles Standard text, but does not match it.

Ah. What's the discrepancy you see?

IMHO, the Standard texts allows for more interpretations (and
misinterpretations) than cppreference.com text

Post by Ben Bacarisse

Post by Michael S
Also, at least to me, Standard text itself appear very far from
clear and way too open to interpretations.
My own interpretation would be that for any negative input strtoul()
should return ULONG_MAX and set errno to ERANGE. None of the actual
implementation that I tested behaves in this manner.

I don't get that from the text. There is, after all, no "negative
input". There is a "subject sequence" which, if it starts with a
minus sign, causes the "value resulting from the conversion is
negated (in the return type)" which seems clear enough.

I find it less than clear.
The most non-clear part is that for strtouxx() as long as "subject
sequence" is in range, it is first converted and then negated. However
when "subject sequence" is out of range it is converted, then clipped
and then *not* negated.
I don't feel confused in the similar way by none-u variants of strtoxx()

Post by Ben Bacarisse

Post by Michael S
It seems, the problem is of what is considered "range of
representable values" for unsigned type is by itself open to
interpretations.
IMHO, even if in some part of the standard there exists text that
clearly states that "range of representable values for unsigned
long = [-ULONG_MAX:ULONG_MAX]" it is worth repeating that in the
section that defines strtol, because it is at all non-intuitive.

I don't get what you are saying here. The range of values is
[0:ULONG_MAX].

That as long as you see sign as something detached from the rest of the
number. I tend to see them as parts of the whole. May be, that's my
mistake.

Ben Bacarisse

2024-06-23 15:30:13 UTC

Post by Michael S
On Sun, 23 Jun 2024 12:38:51 +0100

Post by Ben Bacarisse

Post by Michael S
On Fri, 21 Jun 2024 18:15:07 +0100

Post by Ben Bacarisse

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and strtoumax().

Which are? uunfortunately, not part of C standard.

Post by Kenny McCormack
A result of any smaller type can be obtained by calling one of
these functions and storing the result in an object of the
smaller type.

Or check for range and handle out of range values as
appropriate by situation.

BTW, I don't know what The Standard says about out-of-range
inputs, but at least
https://en.cppreference.com/w/c/string/byte/strtol does not say
anything certain. especially about what stored in *str_end.

It says what value should be returned. That's something certain!

In case of strtol, yes.
In case of strtoul it also says what value should be returned, but
plain reading of cppreference.com text (at least *my* plain reading)
does not match observed behaviour. The text on cppreference.com
resembles Standard text, but does not match it.

Ah. What's the discrepancy you see?

IMHO, the Standard texts allows for more interpretations (and
misinterpretations) than cppreference.com text

I was hoping for an example. As I've used these functions for decades,
I find it hard to see where the alternative interpretations might lie.

Post by Michael S

Post by Ben Bacarisse

Post by Michael S
Also, at least to me, Standard text itself appear very far from
clear and way too open to interpretations.
My own interpretation would be that for any negative input strtoul()
should return ULONG_MAX and set errno to ERANGE. None of the actual
implementation that I tested behaves in this manner.

I don't get that from the text. There is, after all, no "negative
input". There is a "subject sequence" which, if it starts with a
minus sign, causes the "value resulting from the conversion is
negated (in the return type)" which seems clear enough.

I find it less than clear.
The most non-clear part is that for strtouxx() as long as "subject
sequence" is in range,

I think it helps to be precise here: the subject sequence has to be of
the right form, not in the right range.

Post by Michael S
it is first converted and then negated. However
when "subject sequence" is out of range it is converted, then clipped
and then *not* negated.

If the conversion (before negation) is out of range the result will be
ULONG_MAX and errno will be set to ERANGE. Calling this "clipping" is
possibly confusing. For what it's worth, I'm just describing what
happens. I am not saying it is crystal clear.

I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.

--
Ben.

Michael S

2024-06-23 15:47:10 UTC

On Sun, 23 Jun 2024 16:30:13 +0100

Post by Ben Bacarisse
As I've used these functions for
decades, I find it hard to see where the alternative interpretations
might lie.

I also use them for decades, but until last Thursday never payed
attention to what happens when they fed with OOR inputs.

Tim Rentsch

2024-06-23 17:58:30 UTC

Ben Bacarisse <***@bsb.me.uk> writes:

[range questions for strtol(), etc]

Post by Ben Bacarisse
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.

I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.

Scott Lurndal

2024-06-23 21:19:51 UTC

Post by Tim Rentsch
[range questions for strtol(), etc]

Post by Ben Bacarisse
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.

I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.

I think you're both overthinking it.

Tim Rentsch

2024-06-24 05:28:37 UTC

Post by Scott Lurndal

Post by Tim Rentsch
[range questions for strtol(), etc]

Post by Ben Bacarisse
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.

I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.

I think you're both overthinking it.

You aren't saying anything. Do you have something to
say that actually has positive information content?

Keith Thompson

2024-06-23 23:01:34 UTC

Post by Tim Rentsch
[range questions for strtol(), etc]

Post by Ben Bacarisse
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.

I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.

Can you give an example where the specified behavior causes undefined
behavior?

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Ben Bacarisse

2024-06-23 23:49:13 UTC

Post by Keith Thompson

Post by Tim Rentsch
[range questions for strtol(), etc]

Post by Ben Bacarisse
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.

I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.

Can you give an example where the specified behavior causes undefined
behavior?

I don't want to pre-empt Tim's answer, but the wording that bothers me
is

"If the subject sequence begins with a minus sign, the value resulting
from the conversion is negated (in the return type)."

For strtoll("-9223372036854775808", 0, 0) the value resulting from the
conversion is 9223372036854775808 which can not even be represented in
the return type, so how can it be negated "in the return type"?

--
Ben.

Keith Thompson

2024-06-24 00:49:01 UTC

Post by Ben Bacarisse

Post by Keith Thompson

Post by Tim Rentsch
[range questions for strtol(), etc]

Post by Ben Bacarisse
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.

I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.

Can you give an example where the specified behavior causes undefined
behavior?

I don't want to pre-empt Tim's answer, but the wording that bothers me
is
"If the subject sequence begins with a minus sign, the value resulting
from the conversion is negated (in the return type)."
For strtoll("-9223372036854775808", 0, 0) the value resulting from the
conversion is 9223372036854775808 which can not even be represented in
the return type, so how can it be negated "in the return type"?

Understanding the significance of your example requires recognizing
that number, which I didn't immediately.

I'll assume in the following that long long and intmax_t are 64 bits,
2's-complement, no padding bits.

9223372036854775808 is 2**63, and is mathematically equal to
LLONG_MAX+1.

-9223372036854775808 is mathematically equal to LLONG_MIN,
but the behavior of the strtoll() call is specified in
terms of computing 9223372036854775808 (outside the range of
long long) and then negating it. It's obvious (I think) that
strtoll("-9223372036854775808", 0, 0) *should* return LLONG_MIN and
not set errno to ERANGE (which it does in every implementation I've
tried), but the way the standard describes it involves a semantically
impossible operation.

-9223372036854775808 is the mathematical value of LLONG_MIN, but
it's not a valid C expression (so <limits.h> typically has to use
some workaround like (-LLONG_MAX-1)) -- but we expect strtoll to
handle it in the obvious way.

Beyond this example, the wording is also problematic
for out-of-range values with a leading '-' sign, such as
strtoll("-9999999999999999999", 0, 0). The result should be
LLONG_MIN with errno==ERANGE, but again the standard says "the value
resulting from the conversion is negated (in the return type)",
which is not actually possible. The same applies to strtoull().

It's not surprising that implementers have inferred the intent
even if the standard doesn't precisely state it. Still, I'd like
to see the wording made more precise.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Kaz Kylheku

2024-06-24 02:29:19 UTC

Post by Ben Bacarisse
I don't want to pre-empt Tim's answer, but the wording that bothers me
is
"If the subject sequence begins with a minus sign, the value
resulting from the conversion is negated (in the return type)."
For strtoll("-9223372036854775808", 0, 0) the value resulting from the
conversion is 9223372036854775808 which can not even be represented in
the return type, so how can it be negated "in the return type"?

We have to trust that the specification wants the functions to perform
error checking, rather than precipitate into undefined behavior or
implementation-defined results.

If the negation, which is a positive value, cannot be represented in the
type, that implies it is out of range. The required behavior for a
positive out-of-range value is to return LLONG_MAX and set errno to
ERANGE.

The "in the return type" wording sounds like it may be written that way
to cover the unsigned case, strtoull.

I see in the N3220 draft that the signed and unsigned functions are
lumped together and the wording is now:

"If the subject sequence begins with a minus sign, the resulting value
is the negative of the converted value; for functions whose return type
is an unsigned integer type this action is performed in the return
type."

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Kaz Kylheku

2024-06-24 02:31:11 UTC

Post by Kaz Kylheku
If the negation, which is a positive value, cannot be represented in the
type, that implies it is out of range. The required behavior for a
positive out-of-range value is to return LLONG_MAX and set errno to
ERANGE.

Errr, what am I saying! The negation, which is a negative value,
cannot be represented in the type, so the required behavior is to
return LLONG_MIN and set errno to negative.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Keith Thompson

2024-06-24 03:12:24 UTC

Post by Kaz Kylheku

Post by Kaz Kylheku
If the negation, which is a positive value, cannot be represented in the
type, that implies it is out of range. The required behavior for a
positive out-of-range value is to return LLONG_MAX and set errno to
ERANGE.

Errr, what am I saying! The negation, which is a negative value,
cannot be represented in the type, so the required behavior is to
return LLONG_MIN and set errno to negative.

You mean "and set errno to ERANGE".

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Kaz Kylheku

2024-06-24 06:05:33 UTC

Post by Keith Thompson

Post by Kaz Kylheku

Post by Kaz Kylheku
If the negation, which is a positive value, cannot be represented in the
type, that implies it is out of range. The required behavior for a
positive out-of-range value is to return LLONG_MAX and set errno to
ERANGE.

Errr, what am I saying! The negation, which is a negative value,
cannot be represented in the type, so the required behavior is to
return LLONG_MIN and set errno to negative.

You mean "and set errno to ERANGE".

Once you screw up and start correcting yourself, there is no end
to the long tail of erors.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Keith Thompson

2024-06-24 03:11:09 UTC

Post by Kaz Kylheku

Post by Ben Bacarisse
I don't want to pre-empt Tim's answer, but the wording that bothers me
is
"If the subject sequence begins with a minus sign, the value
resulting from the conversion is negated (in the return type)."
For strtoll("-9223372036854775808", 0, 0) the value resulting from the
conversion is 9223372036854775808 which can not even be represented in
the return type, so how can it be negated "in the return type"?

We have to trust that the specification wants the functions to perform
error checking, rather than precipitate into undefined behavior or
implementation-defined results.
If the negation, which is a positive value, cannot be represented in the
type, that implies it is out of range. The required behavior for a
positive out-of-range value is to return LLONG_MAX and set errno to
ERANGE.
The "in the return type" wording sounds like it may be written that way
to cover the unsigned case, strtoull.
I see in the N3220 draft that the signed and unsigned functions are
"If the subject sequence begins with a minus sign, the resulting value
is the negative of the converted value; for functions whose return type
is an unsigned integer type this action is performed in the return
type."

I should have checked the C23 draft before. I see that the wording has
been improved.

(Note that N3220 is actually an early draft of C26. The latest public
pre-C23 draft is N3149. The content should be very close; I don't
believe N3220 includes any post-C23 proposed changes.)

It's fairly clear that the "value" referred to in the quoted text is a
mathematical value, which might be outside the representable range of
any C type. The paragraph describing the returned value confirms this:
"If the correct value is outside the range of representable values ...".

So for strtoll("-9223372036854775808", NULL, 10) the "converted value"
of 9223372036854775808 exceeds LLONG_MAX, but that's ok. That value is
negated (mathematically) yielding -9223372036854775808, which is equal
to LLONG_MIN.

There's still some ambiguity for strtoull("-9999999999999999999", NULL,
10) (that's well outside the range of a 64-bit integer). For that to
work as expected, we have to assume that the determination that "the
correct value is outside the range of representable values" happens
*before* the negation "is performed in the return type". It's not clear
that this problem is worth fixing (doing so would likely make that
section longer and perhaps more confusing).

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Michael S

2024-06-24 10:19:41 UTC

On Sun, 23 Jun 2024 20:11:09 -0700

Post by Keith Thompson
There's still some ambiguity for strtoull("-9999999999999999999",
NULL, 10) (that's well outside the range of a 64-bit integer). For
that to work as expected, we have to assume that the determination
that "the correct value is outside the range of representable values"
happens *before* the negation "is performed in the return type".
It's not clear that this problem is worth fixing (doing so would
likely make that section longer and perhaps more confusing).

There is nothing wrong with longer sections.
Personally I would prefer for each strtoxxx() function to have
its own description fully independent of all others. It would make
each of them easier to follow.
DRY is a good principle for programming, not necessarily for writing
Standards.

Tim Rentsch

2024-06-24 05:30:35 UTC

Post by Keith Thompson

Post by Tim Rentsch
[range questions for strtol(), etc]

Post by Ben Bacarisse
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.

I understand what these functions do, but their specification in the
C standard is a little off. To my way of thinking the impact is
minimal, but the specified behavior is either unequivocally wrong or
there are some cases that give rise to undefined behavior.

Can you give an example where the specified behavior causes undefined
behavior?

Ben gave a good answer. (My thanks to Ben for both the
content and the style of his answer.)

Lawrence D'Oliveiro

2024-06-24 00:48:12 UTC

Post by Ben Bacarisse
I think there /is/ something problematic with the wording about the
negation. It happens "in the return type" but how can
9223372036854775808 be negated in the type long long int? OK, the
negated value can be /represented/ in the type long long int but that's
not quite the same thing. On the othee hand, for the unsigned return
types, the negation "in the return type" is what produces ULONG_MAX for
"-1" when the negated value, -1, can't be /represented/ in the return
type. It's a case where, over the years, I've just got used to what's
happening.

In the C23 spec, section 7.24.1.7, “The strtol, strtoll, strtoul, and
strtoull functions”, paragraph 5 begins:

If the subject sequence has the expected form and the value of
base is zero, the sequence of characters starting with the first
digit is interpreted as an integer constant according to the rules
of 6.4.4.2.

Note this is excluding any sign. So if the non-negated value cannot be
represented in the desired type, then there is no valid value to apply
negation to, so according to paragraph 8, zero is returned.

James Kuyper

2024-06-21 18:38:56 UTC

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and
strtoumax().

Which are? uunfortunately, not part of C standard.

They have been part of the C standard since C99.

Post by Michael S
BTW, I don't know what The Standard says about out-of-range inputs, but
at least https://en.cppreference.com/w/c/string/byte/strtol does not
say anything certain. especially about what stored in *str_end.

"The strtoimax and strtoumax functions are equivalent to the strtol,
strtoll, strtoul, and strtoull functions, except that the initial
portion of the string is converted to intmax_t and uintmax_t
representation, respectively." (7.8.2.3p2)

You need to go to the descriptions of those other functions to get the
detailed specifications.

"If the correct value is outside the range of representable values,
LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is
returned (according to the return type and sign of the value, if any),
and the value of the macro ERANGE is stored in errno."

As I understand it, that means that if the input string represents a
value outside of the range of representable values, then strtoimax()
should return INTMAX_MIN or INTMAX_MAX, depending upon the sign, and
strtouimax() should return UINTMAX_MAX. Both of them should store the
value of ERANGE in errno, to distinguish these results from what you
would get if the string happened to represent those values.

The C standard uses end_ptr rather than str_end in it's description of
these functions.

"... First, they decompose the input string into three parts: an
initial, possibly empty, sequence of white-space characters, a subject
sequence resembling an integer represented in some radix determined by
the value of base, and a final string of one or more unrecognized
characters, including the terminating null character of the input
string. ..." (7.21.4.7p2).

That defines what the "final string" is.

"If the subject sequence has the expected form, ... A pointer to the
final string is stored in the object pointed to by endptr, provided that
endptr is not a null pointer." (7.24.1.7p5).

"If the subject sequence is empty or does not have the expected form ...
the value of nptr is stored in the object pointed to by endptr, provided
that endptr is not a null pointer." (7.21.4.7p7)

That seems very precise and unambiguous to me, aside from what "the
expected form" is, which is described elsewhere.

Kenny McCormack

2024-06-21 18:43:29 UTC

Post by James Kuyper

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and
strtoumax().

Which are? uunfortunately, not part of C standard.

They have been part of the C standard since C99.

To some people, "Standard C" means C89.

Everything after that is, like POSIX, just fluffy nonsense.

--
12% of Americans think that Joan of Arc was Noah's wife.

Michael S

2024-06-23 08:47:56 UTC

On Fri, 21 Jun 2024 18:43:29 -0000 (UTC)

Post by Kenny McCormack

Post by James Kuyper

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and
strtoumax().

Which are? uunfortunately, not part of C standard.

They have been part of the C standard since C99.

To some people, "Standard C" means C89.

That is not my case.
I was sincerely mistaken.

Post by Kenny McCormack
Everything after that is, like POSIX, just fluffy nonsense.

I don't think that POSIX is fluffy nonsense. I do know, however, that
POSIX is irrelevant for overwhelming majority of C programming that I
do at work.
Newer C standards are significantly more relevant, esp. language
features.
For library features, C Standard is relevant in a sense that if
particular standard function exists in the library that I use, then it
is very likely that it matches semantics of the standard.

Michael S

2024-06-22 18:18:35 UTC

On Fri, 21 Jun 2024 14:38:56 -0400

Post by James Kuyper

Post by Michael S
On Fri, 21 Jun 2024 18:28:39 +0300

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and
strtoumax().

Which are? uunfortunately, not part of C standard.

They have been part of the C standard since C99.

Post by Michael S
BTW, I don't know what The Standard says about out-of-range inputs,
but at least https://en.cppreference.com/w/c/string/byte/strtol
does not say anything certain. especially about what stored in
*str_end.

"The strtoimax and strtoumax functions are equivalent to the strtol,
strtoll, strtoul, and strtoull functions, except that the initial
portion of the string is converted to intmax_t and uintmax_t
representation, respectively." (7.8.2.3p2)
You need to go to the descriptions of those other functions to get the
detailed specifications.
"If the correct value is outside the range of representable values,
LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is
returned (according to the return type and sign of the value, if any),
and the value of the macro ERANGE is stored in errno."
As I understand it, that means that if the input string represents a
value outside of the range of representable values, then strtoimax()
should return INTMAX_MIN or INTMAX_MAX, depending upon the sign, and
strtouimax() should return UINTMAX_MAX. Both of them should store the
value of ERANGE in errno, to distinguish these results from what you
would get if the string happened to represent those values.

That what is done by my implementation, but I can not understand how it
follows from the text, esp. for a case of out of range negative input
for strtou**() functions.
That creates rather non-intuitive discontinuity.
strtoull("-18446744073709551615") => 1
strtoull("-18446744073709551616") => 18446744073709551615

Post by James Kuyper
The C standard uses end_ptr rather than str_end in it's description of
these functions.
"... First, they decompose the input string into three parts: an
initial, possibly empty, sequence of white-space characters, a subject
sequence resembling an integer represented in some radix determined by
the value of base, and a final string of one or more unrecognized
characters, including the terminating null character of the input
string. ..." (7.21.4.7p2).
That defines what the "final string" is.
"If the subject sequence has the expected form, ... A pointer to the
final string is stored in the object pointed to by endptr, provided
that endptr is not a null pointer." (7.24.1.7p5).
"If the subject sequence is empty or does not have the expected form
... the value of nptr is stored in the object pointed to by endptr,
provided that endptr is not a null pointer." (7.21.4.7p7)
That seems very precise and unambiguous to me, aside from what "the
expected form" is, which is described elsewhere.

Yes, this part of description is good and unambiguous.
I wonder why cppreference.com had chosen to use less clear wording "The
functions set the pointer pointed to by str_end to point to the
character past the last numeric character interpreted."

Ben Bacarisse

2024-06-21 17:02:28 UTC

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
Yeah, now I get it. You really only need strtoimax() and strtoumax().

Which are? uunfortunately, not part of C standard.

Not sure if that '?' is just a typo. Anyway, yes they are both part of
the C standard.

--
Ben.

Keith Thompson

2024-06-21 17:38:51 UTC

[...]

Post by Ben Bacarisse

Post by Michael S
Which are? uunfortunately, not part of C standard.

Not sure if that '?' is just a typo. Anyway, yes they are both part of
the C standard.

strto[u]l[l] are declared in <stdlib.h> strtoimax and strtoumax are
declared in <inttypes.h>, which can make them easy to miss.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Lawrence D'Oliveiro

2024-06-22 06:43:45 UTC

Post by Keith Thompson
strto[u]l[l] are declared in <stdlib.h> strtoimax and strtoumax are
declared in <inttypes.h>, which can make them easy to miss.

The first thing I do is check the man pages
<https://manpages.debian.org/3/strtoimax.3.en.html>:

STANDARDS

POSIX.1-2001, POSIX.1-2008, C99.

Michael S

2024-06-22 18:04:38 UTC

On Fri, 21 Jun 2024 10:38:51 -0700

Post by Keith Thompson
[...]

Post by Ben Bacarisse

Post by Michael S
Which are? uunfortunately, not part of C standard.

Not sure if that '?' is just a typo. Anyway, yes they are both
part of the C standard.

strto[u]l[l] are declared in <stdlib.h> strtoimax and strtoumax are
declared in <inttypes.h>, which can make them easy to miss.

May be, that is the reason. But frankly, I expected that
cppreference.com will do better. As a minimum, strtoimax should have
ben listed in "See also" section on this page:
https://en.cppreference.com/w/c/string/byte/strtol

Lawrence D'Oliveiro

2024-06-22 23:22:28 UTC

But frankly, I expected that cppreference.com will do better.

This is why we have authoritative references.

Keith Thompson

2024-06-22 23:43:47 UTC

Post by Michael S
On Fri, 21 Jun 2024 10:38:51 -0700

Post by Keith Thompson
[...]

Post by Ben Bacarisse

Post by Michael S
Which are? uunfortunately, not part of C standard.

Not sure if that '?' is just a typo. Anyway, yes they are both
part of the C standard.

strto[u]l[l] are declared in <stdlib.h> strtoimax and strtoumax are
declared in <inttypes.h>, which can make them easy to miss.

May be, that is the reason. But frankly, I expected that
cppreference.com will do better. As a minimum, strtoimax should have
https://en.cppreference.com/w/c/string/byte/strtol

I agree, and I'm going to suggest that change. (Editing of the page is
currently disabled for new users due to vandalism and I've had some
problems with my account.)

But <https://en.cppreference.com/w/c/string/byte/strtoimax> does
indicate that both functions are declared in <inttypes.h> and has
references to the C99 and later standards.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Michael S

2024-06-21 16:00:08 UTC

On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

Post by Kenny McCormack
2) Because it means that the two functions are literally the same
code. Both calculate the same bit pattern - the difference is only in
the caller's interpretation of the result.

I implementation that I just tested strtoll and strtull are not the
same. They deliver different answers when input is out of range.

Keith Thompson

2024-06-21 17:50:40 UTC

Post by Michael S
On Fri, 21 Jun 2024 13:58:01 -0000 (UTC)

[...]

Post by Michael S
I implementation that I just tested strtoll and strtull are not the
same. They deliver different answers when input is out of range.

Yes, that's the required behavior. N1570 7.22.1.4p8:
"""
The strtol, strtoll, strtoul, and strtoull functions return the
converted value, if any. If no conversion could be performed, zero is
returned. If the correct value is outside the range of representable
values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or
ULLONG_MAX is returned (according to the return type and sign of the
value, if any), and the value of the macro ERANGE is stored in errno.
"""

N3220 has identical wording in 7.24.1.7p8. The wording for strtoimax
and strtoumax (<inttypes.h>) is equivalent.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

Kaz Kylheku

2024-06-22 22:07:58 UTC

Post by Kenny McCormack

Post by Kenny McCormack
Interestingly, I note that strtoul() accepts strings that begin with a sign
(+ or -). This is odd, since you'd (*) think that a sign (particularly, a
minus) would be a syntax error in parsing for an unsigned value.

There have been some useful responses on this thread, which is Good. Of
course, there have also been the usual crappola-type responses, but one must
learn to take the good with the bad.
Anyway, I think the takeaway is that while it is what it is, an argument
can certainly be made that it would have been better for the unsigned
versions of these function to not accept signed input. If I were designing
it, I would have had strtoul("-1") be a syntax error (not a C language
syntax error - but a meta-language syntax error) - or, if not that, then
have it return 1, not 2**N-1. But that's just me.

An alternative would be for the current minus handling behavior to apply
when the base is specified as zero, which is where the other hacks are
like leading 0 for octal and 0x for hexadecimal (that one also
recognized in base 16).

Post by Kenny McCormack
I appreciate the responses indicating that it was probably done the way it
1) Because it makes it more useful for C compiler writers - who were
seen as the primary audience.
2) Because it means that the two functions are literally the same code.
Both calculate the same bit pattern - the difference is only in the
caller's interpretation of the result.

3) The behavior is also useful for IT people who understand two's
complement computer arithmetic:

voipserver --debug-mask=-1 # more convenient than --debug-mask=0xFFFFFFFF

It's why the 0x prefix is supported when base is 0, and also octal.

It supports not only compiler writing but system utilities.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca

Richard Kettlewell

2024-06-23 16:39:37 UTC

Post by Kenny McCormack
Interestingly, I note that strtoul() accepts strings that begin with a
sign (+ or -). This is odd, since you'd (*) think that a sign
(particularly, a minus) would be a syntax error in parsing for an
unsigned value.
Further, although the (Linux) man page is more than a bit murky on the
subject, it seems that the result of parsing, say, "-1", with
strtoul() is the largest unsigned value (usually, 2**N-1 or a lot of
F's (in hex)). Whereas, I would expect it to be 1 (i.e., just take
the absolute value).
Comments? I find this all very counterintuitive.

I can think of contexts where the string -1 would be read as meaning 1
(e.g. GF(2^n)) but I don’t think most people would think they were a
sensible analogy for stroul behavior. Its behavior seems consistent with
the normal meaning of unary minus (i.e. additive inverse) and of course
with C’s treatment of unsigned integer types.

--
https://www.greenend.org.uk/rjk/

49 Replies
4 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Kenny McCormack 2024-06-20 14:06:45 UTC

Scott Lurndal 2024-06-20 14:46:52 UTC

Keith Thompson 2024-06-20 21:37:29 UTC

Lew Pitcher 2024-06-20 14:48:53 UTC

Lew Pitcher 2024-06-20 15:26:51 UTC

Kaz Kylheku 2024-06-20 22:55:01 UTC

Kenny McCormack 2024-06-20 23:35:37 UTC

Kenny McCormack 2024-06-21 13:58:01 UTC

Michael S 2024-06-21 15:28:39 UTC

Michael S 2024-06-21 15:53:14 UTC

Scott Lurndal 2024-06-21 16:14:58 UTC

Scott Lurndal 2024-06-21 16:54:33 UTC

Lawrence D'Oliveiro 2024-06-22 06:44:38 UTC

Scott Lurndal 2024-06-22 15:16:24 UTC

Lawrence D'Oliveiro 2024-06-22 23:21:43 UTC

James Kuyper 2024-06-23 00:10:32 UTC

Ben Bacarisse 2024-06-21 17:15:07 UTC

Michael S 2024-06-23 09:19:52 UTC

Ben Bacarisse 2024-06-23 11:38:51 UTC

Michael S 2024-06-23 12:32:19 UTC

Ben Bacarisse 2024-06-23 15:30:13 UTC

Michael S 2024-06-23 15:47:10 UTC

Tim Rentsch 2024-06-23 17:58:30 UTC

Scott Lurndal 2024-06-23 21:19:51 UTC

Tim Rentsch 2024-06-24 05:28:37 UTC

Keith Thompson 2024-06-23 23:01:34 UTC

Ben Bacarisse 2024-06-23 23:49:13 UTC

Keith Thompson 2024-06-24 00:49:01 UTC

Kaz Kylheku 2024-06-24 02:29:19 UTC

Kaz Kylheku 2024-06-24 02:31:11 UTC

Keith Thompson 2024-06-24 03:12:24 UTC

Kaz Kylheku 2024-06-24 06:05:33 UTC

Keith Thompson 2024-06-24 03:11:09 UTC

Michael S 2024-06-24 10:19:41 UTC

Tim Rentsch 2024-06-24 05:30:35 UTC

Lawrence D'Oliveiro 2024-06-24 00:48:12 UTC

James Kuyper 2024-06-21 18:38:56 UTC

Kenny McCormack 2024-06-21 18:43:29 UTC

Michael S 2024-06-23 08:47:56 UTC

Michael S 2024-06-22 18:18:35 UTC

Ben Bacarisse 2024-06-21 17:02:28 UTC

Keith Thompson 2024-06-21 17:38:51 UTC

Lawrence D'Oliveiro 2024-06-22 06:43:45 UTC

Michael S 2024-06-22 18:04:38 UTC

Lawrence D'Oliveiro 2024-06-22 23:22:28 UTC

Keith Thompson 2024-06-22 23:43:47 UTC

Michael S 2024-06-21 16:00:08 UTC

Keith Thompson 2024-06-21 17:50:40 UTC

Kaz Kylheku 2024-06-22 22:07:58 UTC

Richard Kettlewell 2024-06-23 16:39:37 UTC

about - legalese

Loading...