Discussion:
Which tools are available for catching UB?
Add Reply
Anthony Cuozzo
2024-01-11 04:15:52 UTC
Reply
Permalink
The only tool I use regularly for identifying instances of undefined
behavior is the semantics compiler "kcc" from RV-Match.

Are there any other tools out there besides what ships with e.g., GCC &
Clang?

Thanks,
--Anthony Cuozzo
David Brown
2024-01-11 12:43:59 UTC
Reply
Permalink
Post by Anthony Cuozzo
The only tool I use regularly for identifying instances of undefined
behavior is the semantics compiler "kcc" from RV-Match.
Are there any other tools out there besides what ships with e.g., GCC &
Clang?
Both gcc and clang have "sanitizers". You compile the code with the
appropriate options, and the code is augmented with checks for different
kinds of UB, detected at run-time. gcc and clang have many of these in
common, and some that are only implemented in one of them. Some
sanitizers can have significant impact on code speed, others do not.
You will want to try things with different flags to see what works best
for you.

<https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fsanitize_003dundefined>

<https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html>


Both gcc and clang can also do a great deal of static error checking
which can find some kinds of UB before running the code. And there are
other tools such as clang-tidy, and third-party linters and checkers,
that can help. (Some are quite expensive.)
Anthony Cuozzo
2024-01-11 23:15:22 UTC
Reply
Permalink
Post by Anthony Cuozzo
The only tool I use regularly for identifying instances of undefined
behavior is the semantics compiler "kcc" from RV-Match.
Are there any other tools out there besides what ships with e.g., GCC
& Clang?
Both gcc and clang have "sanitizers".  You compile the code with the
appropriate options, and the code is augmented with checks for different
kinds of UB, detected at run-time.  gcc and clang have many of these in
common, and some that are only implemented in one of them.  Some
sanitizers can have significant impact on code speed, others do not. You
will want to try things with different flags to see what works best for
you.
<https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fsanitize_003dundefined>
<https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html>
Both gcc and clang can also do a great deal of static error checking
which can find some kinds of UB before running the code.  And there are
other tools such as clang-tidy, and third-party linters and checkers,
that can help.  (Some are quite expensive.)
I suppose I was/am looking for static analysis tools which focus on UB,
but now that I've given it more thought I realize that only a subset of
UB can be detected at compile time.

Semi-related: Do you know if there's a resource which breaks down UB per
standard? I'd like to see how things have changed over time.

Thanks,
--Anthony
Keith Thompson
2024-01-12 00:09:56 UTC
Reply
Permalink
Anthony Cuozzo <***@cuozzo.us> writes:
[...]
Post by Anthony Cuozzo
I suppose I was/am looking for static analysis tools which focus on
UB, but now that I've given it more thought I realize that only a
subset of UB can be detected at compile time.
Which, in many or most cases, is exactly why it's UB.

Ideally, something's behavior is left undefined because it's impractical
to detect the problem. In some cases, behavior has been left undefined
(or unspecified, or implementation-defined) because existing
implementations behave differently.
Post by Anthony Cuozzo
Semi-related: Do you know if there's a resource which breaks down UB
per standard? I'd like to see how things have changed over time.
Each edition of the standard has an annex (Annex J in the case of C11)
that summarizes unspecified, undefined, and implementation-defined
behaviors. The standards themselves cost money, but drafts are freely
available.

Some instances of undefined behavior are specified explicitly. Others
are undefined just because the standard provides no definition. Both
kinds are equivalent, and can in principle result in the same kinds of
Bad Things Happening.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
David Brown
2024-01-12 13:50:53 UTC
Reply
Permalink
Post by Anthony Cuozzo
Post by Anthony Cuozzo
The only tool I use regularly for identifying instances of undefined
behavior is the semantics compiler "kcc" from RV-Match.
Are there any other tools out there besides what ships with e.g., GCC
& Clang?
Both gcc and clang have "sanitizers".  You compile the code with the
appropriate options, and the code is augmented with checks for
different kinds of UB, detected at run-time.  gcc and clang have many
of these in common, and some that are only implemented in one of
them.  Some sanitizers can have significant impact on code speed,
others do not. You will want to try things with different flags to see
what works best for you.
<https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fsanitize_003dundefined>
<https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html>
Both gcc and clang can also do a great deal of static error checking
which can find some kinds of UB before running the code.  And there
are other tools such as clang-tidy, and third-party linters and
checkers, that can help.  (Some are quite expensive.)
I suppose I was/am looking for static analysis tools which focus on UB,
but now that I've given it more thought I realize that only a subset of
UB can be detected at compile time.
That is absolutely correct. In fact, most UB can only be detected at
run time. Static analysis (in a compiler, or dedicated tools) can
usually only see some kinds of /potential/ UB. For example, if you
write "int foo(void) { return 1 / 0; }", that is not UB in itself - it
is only UB if your program calls "foo". And usually the compiler isn't
able to determine what code will actually be called when you run the
program, unless it can trace the execution unconditionally from main().

But it is, IMHO, a good idea to find as many of your codes bugs as
possible using static checking - it's the easiest and cheapest time to
do it. gcc and clang both have quite sophisticated warnings and static
analysis features (with steadily more for each new compiler release),
and clang also has some stand-alone tools for the job. There are also
dedicated tools for particular use-cases (such as tools for checking
Linux kernel code for certain kinds of problems). And there are quite a
number of commercial tools that do very sophisticated static error
checking, if your budget stretches to buying them.
Post by Anthony Cuozzo
Semi-related: Do you know if there's a resource which breaks down UB per
standard? I'd like to see how things have changed over time.
Each C standard version has an Annex that lists the explicit UB
described in the standard - but remember that things that have no
standards-defined behaviour are also UB in C (though a compiler may
choose to define them).
Richard Kettlewell
2024-01-12 08:51:24 UTC
Reply
Permalink
Post by Anthony Cuozzo
The only tool I use regularly for identifying instances of undefined
behavior is the semantics compiler "kcc" from RV-Match.
Are there any other tools out there besides what ships with e.g., GCC
& Clang?
Dynamic analysis:

* https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html,
search for ‘sanitize’. Instruments executable to detect various issues
at runtime.

* https://www.gnu.org/software/libc/manual/html_node/Source-Fortification.html
Additional bounds checking.

* https://clang.llvm.org/docs/index.html, search for ‘sanitize’. Ditto.

* https://valgrind.org/. Detects various issues in unmodified
executables.

Static analysis:

* https://clang-analyzer.llvm.org/. Quite limited and struggles with
false positives IME.

* https://www.synopsys.com/software-integrity/static-analysis-tools-sast/coverity.html
Extensive checking and does find many real issues but also produces a
lot of false positives. Pricey.

* https://scan.coverity.com/. Free version of the above for open source
projects.
--
https://www.greenend.org.uk/rjk/
Malcolm McLean
2024-01-18 18:17:16 UTC
Reply
Permalink
Post by Anthony Cuozzo
The only tool I use regularly for identifying instances of undefined
behavior is the semantics compiler "kcc" from RV-Match.
Are there any other tools out there besides what ships with e.g., GCC &
Clang?
Thanks,
--Anthony Cuozzo
Almost by definition you can't catch all undefined behaviour, since it
is "undefined".
Out of bounds array accesses can be caught by sanitizers or valgrind.
C is notorious for this bug, since dynamic arrays have no way of
obtaining the size by querying the pointer, so size and array have to be
passed in separate variables, and the potential for them getting out of
synch is high.

But undefined behaviour like a shift which is out of range is harder to
catch. Whilst it is undefined in C, it often compiles to valid and
perfectly well-behaved machine code.
Lew Pitcher
2024-01-18 19:08:59 UTC
Reply
Permalink
Post by Anthony Cuozzo
The only tool I use regularly for identifying instances of undefined
behavior is the semantics compiler "kcc" from RV-Match.
Are there any other tools out there besides what ships with e.g., GCC &
Clang?
By definition (for instance, C11 Section 3.4.3: "undefined behavior") undefined
behaviour is "behavior, upon use of a nonportable or erroneous program construct
or of erroneous data,for which this International Standard imposes no requirements".

Outside of the "erroneous" constructs and data, this also means that "nonportable"
program constructs, for which the International Standard imposes no requirements,
invoke "undefined behaviour", as far as the ISO C standard is concerned.

This means that a single call to a function not defined by your program source
code or by the ISO C standard will invoke "undefined behaviour". So, a program
that calls CopyFile() (a Microsoft Windows API) or open() (a POSIX API) invokes
"undefined behaviour".

While it is certainly possible to write C programs that adhere entirely to the
ISO C standard, many C programs (dare I say, most C programs?) invoke /some/
amount of "undefined behaviour" wrt the C standard, even when the behaviour
/is/ defined by other standards and sources.

So, does "kcc" from RV-Match catch these forms of "undefined behaviour"?"
--
Lew Pitcher
"In Skills We Trust"
James Kuyper
2024-01-18 19:42:16 UTC
Reply
Permalink
On 1/18/24 14:08, Lew Pitcher wrote:
...
Post by Lew Pitcher
By definition (for instance, C11 Section 3.4.3: "undefined behavior") undefined
behaviour is "behavior, upon use of a nonportable or erroneous program construct
or of erroneous data,for which this International Standard imposes no requirements".
Outside of the "erroneous" constructs and data, this also means that "nonportable"
program constructs, for which the International Standard imposes no requirements,
invoke "undefined behaviour", as far as the ISO C standard is concerned.
This means that a single call to a function not defined by your program source
code or by the ISO C standard will invoke "undefined behaviour". So, a program
that calls CopyFile() (a Microsoft Windows API) or open() (a POSIX API) invokes
"undefined behaviour".
While it is certainly possible to write C programs that adhere
entirely to the
ISO C standard, many C programs (dare I say, most C programs?) invoke /some/
amount of "undefined behaviour" wrt the C standard, even when the behaviour
/is/ defined by other standards and sources.
Keep in mind that "undefined behavior" in C means ONLY that "this
international standard" imposes no requirements. If requirements are
imposed by some other document, such as the documentation for the
library that you're using, those requirements can be sufficient to make
your program useful. If that library's documentation describes the
behavior of the particular function you're calling, that's sufficient
for that function call. If it also claims compatibility with a given
version of the C standard, that implies that when compiling and linking
with that version of the C standard, all requirements that the C
standard would impose on all of your code except that function call also
apply - not because the C standard says so, but because the library's
documentation says so.
Lew Pitcher
2024-01-18 20:24:59 UTC
Reply
Permalink
Post by James Kuyper
...
Post by Lew Pitcher
By definition (for instance, C11 Section 3.4.3: "undefined behavior") undefined
behaviour is "behavior, upon use of a nonportable or erroneous program construct
or of erroneous data,for which this International Standard imposes no requirements".
Outside of the "erroneous" constructs and data, this also means that "nonportable"
program constructs, for which the International Standard imposes no requirements,
invoke "undefined behaviour", as far as the ISO C standard is concerned.
This means that a single call to a function not defined by your program source
code or by the ISO C standard will invoke "undefined behaviour". So, a program
that calls CopyFile() (a Microsoft Windows API) or open() (a POSIX API) invokes
"undefined behaviour".
While it is certainly possible to write C programs that adhere entirely to the
ISO C standard, many C programs (dare I say, most C programs?) invoke /some/
amount of "undefined behaviour" wrt the C standard, even when the behaviour
/is/ defined by other standards and sources.
Keep in mind that "undefined behavior" in C means ONLY that "this
international standard" imposes no requirements.
My point, exactly.

My question to the OP was, in effect, is the tool that the OP uses
"strict" in it's detection of UB (i.e. calling a program that uses
POSIX apis as exhibiting "undefined behaviour") or does it allow a
looser interpretation?

[snip]
--
Lew Pitcher
"In Skills We Trust"
Tim Rentsch
2024-01-26 03:57:53 UTC
Reply
Permalink
[A] single call to a function not defined by your program source
code or by the ISO C standard will invoke "undefined behaviour".
That isn't right. The C standard allows previously translated
translation units "[to] be preserved individually or in libraries."
Those translation units don't have to be your own code or even
necessarily stored, or translated, on the same machine. In
translation phase 8, "[l]ibrary components are linked to satisfy
external references to functions and objects not defined in the
current translation." The C standard doesn't specify how the
libraries are located, or even require that you be able to inspect
them, but clearly does require that libraries be consulted to satisfy
external references. We don't know what code in the libraries will
do, but there is a requirement /on the implementation/ that they be
linked against in phase 8. The presence of that requirement means
that linking to, or calling, such an external reference is not ipso
facto undefined behavior. (Obviously it could be undefined behavior
for other reasons, but not just by virtue of there being a call.)

Not knowing what something will do is not the same as undefined
behavior. The question is Does the C standard give a requirement
about what implementations have to do? In this case it does. An
implementation is not free to do whatever it wants just because a
library was previously translated on a different machine. Code in
a library might (emphasis _might_) provoke undefined behavior if it
is called, but that depends on what the library code is, and is not
something an implementation can just arbitrarily chose to do on its
own. It's important to understand the difference.
Kaz Kylheku
2024-01-26 04:52:39 UTC
Reply
Permalink
Post by Tim Rentsch
[A] single call to a function not defined by your program source
code or by the ISO C standard will invoke "undefined behaviour".
That isn't right. The C standard allows previously translated
translation units "[to] be preserved individually or in libraries."
Those translation units don't have to be your own code or even
necessarily stored, or translated, on the same machine.
This is a strawman interpretation of what Lew is almost certainly
saying, which is the salient point that using a function that is not
somewhere in your program (any translation unit from your sources or any
translated units you brought to the table yourself), and not in the
standard, is undefined behavior.

He can't be literally saying that calling a function foo is undefined
behavior, even if it's found in a libfoo.a that is brought from
another machine, and which has no compatibility issues (like wrong
architecture, unsupported object format, wrong ABI), and is being linked
to the program, and used correctly according to its documentation.

That would be silly, and uncharacteristic of Lew's level of experience,
so it can't be the right interpretation.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
Tim Rentsch
2024-02-10 10:06:40 UTC
Reply
Permalink
Post by Kaz Kylheku
Post by Tim Rentsch
[A] single call to a function not defined by your program source
code or by the ISO C standard will invoke "undefined behaviour".
That isn't right. The C standard allows previously translated
translation units "[to] be preserved individually or in libraries."
Those translation units don't have to be your own code or even
necessarily stored, or translated, on the same machine.
This is a strawman interpretation of what Lew is almost certainly
saying,
No, it isn't. You misunderstood my statement.
Post by Kaz Kylheku
which is the salient point that using a function that is not
somewhere in your program (any translation unit from your sources
or any translated units you brought to the table yourself), and
not in the standard, is undefined behavior.
No, it isn't. Whether a library, for example, was something you put
on the machine yourself, or was put there by a hacker without your
knowledge, doesn't affect the presence or absence of undefined
behavior. All that matters is what's in the library. It's
perfectly possible for a library installed by a hacker to perform
only well-defined operations, be well-formed and ABI-compatible,
etc. Just because you don't know what is in the library doesn't
make it undefined behavior.

Kaz Kylheku
2024-01-18 19:41:57 UTC
Reply
Permalink
Post by Anthony Cuozzo
Are there any other tools out there besides what ships with e.g., GCC &
Clang?
All the tools that waste their time hanging out on comp.lang.c are
pretty good for catching UB.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @***@mstdn.ca
NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.
Chris M. Thomasson
2024-01-18 21:18:11 UTC
Reply
Permalink
Post by Kaz Kylheku
Post by Anthony Cuozzo
Are there any other tools out there besides what ships with e.g., GCC &
Clang?
All the tools that waste their time hanging out on comp.lang.c are
pretty good for catching UB.
A little harsh? :^)
Kenny McCormack
2024-01-19 03:08:17 UTC
Reply
Permalink
Post by Kaz Kylheku
Post by Anthony Cuozzo
Are there any other tools out there besides what ships with e.g., GCC &
Clang?
All the tools that waste their time hanging out on comp.lang.c are
pretty good for catching UB.
Well done, sir!

(nice play on the word "tool")
--
Note that Oprah actually is all the things that The Donald only wishes he were.
For one thing, she actually *is* a billionaire. She's also actually self-made,
came from nothing, knows how to run businesses, never went bankrupt, is smart
and is mentally stable.
Loading...