Null Pointer exceptions

Discussion:

(too old to reply)

jeff

2015-04-22 21:34:25 UTC

Howdy

I've got a large C codebase that frequently triggers bugs where pointers
that shouldn't be null, somehow end up getting nulled.

It's easy enough to fix up each individual bug when it gets reported,
just by adding a null check:

if
((void*) ptr != (void*)NULL)
{
// existing code that triggered a crash
}

However, as these null conditions only show up occasionally, there always
seem to be more bugs left to find.

My question is: is there a way to deal with all of these problems in one
swoop, by ignoring all code that dereferences a null pointer, i.e. null
dereference becomes a NO-OP instead of triggering an exception. ?

Respectfully,
jeff

James Kuyper

2015-04-22 22:02:28 UTC

Permalink

Post by jeff
Howdy
I've got a large C codebase that frequently triggers bugs where pointers
that shouldn't be null, somehow end up getting nulled.
It's easy enough to fix up each individual bug when it gets reported,
if
((void*) ptr != (void*)NULL)
{
// existing code that triggered a crash
}
However, as these null conditions only show up occasionally, there always
seem to be more bugs left to find.
My question is: is there a way to deal with all of these problems in one
swoop, by ignoring all code that dereferences a null pointer, i.e. null
dereference becomes a NO-OP instead of triggering an exception. ?

Every time a pointer that shouldn't be null gets dereferenced, you
should investigate to find out why it was null. Don't simply patch over
the problem by checking whether it is null - find out why it was
unexpectedly null. Fix that problem. Then look for other similar
problems - if this problem comes up repeatedly, you've probably made the
same kind of mistake (whatever that mistake might be) in many different
location.
You'll never get anywhere with debugging your code if you concentrate
your efforts on covering up symptoms, rather than tracking down causes.
You're like a doctor who sees a patient coughing up blood, and
concentrates his attention on cleaning up the blood. A proper doctor
need to figure out why the patient is bleeding internally before the
doctor can put a stop to it, otherwise the bleeding will just continue
until the patient dies as a result.

Stephen Sprunk

2015-04-22 23:30:43 UTC

Permalink

Post by James Kuyper

Post by jeff
My question is: is there a way to deal with all of these problems
in one swoop, by ignoring all code that dereferences a null
pointer, i.e. null dereference becomes a NO-OP instead of
triggering an exception. ?

Every time a pointer that shouldn't be null gets dereferenced, you
should investigate to find out why it was null. Don't simply patch
over the problem by checking whether it is null - find out why it
was unexpectedly null. Fix that problem. Then look for other similar
problems - if this problem comes up repeatedly, you've probably made
the same kind of mistake (whatever that mistake might be) in many
different location. You'll never get anywhere with debugging your
code if you concentrate your efforts on covering up symptoms, rather
than tracking down causes. You're like a doctor who sees a patient
coughing up blood, and concentrates his attention on cleaning up the
blood. A proper doctor need to figure out why the patient is bleeding
internally before the doctor can put a stop to it, otherwise the
bleeding will just continue until the patient dies as a result.

Figuring out the cause (and treating it) takes time, and if you do
nothing for the symptoms in the meantime, the patient may die anyway.
If no cure exists for the particular disease, then all you can do is
manage the symptoms.

For unreproducible bugs, managing the symptoms may be the only option
because you don't have enough data (or time) to find the root cause.
You can add some sanity checks and maybe some extra logging in case it
happens again, but then you move on to the next bug in the queue.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

James Kuyper

2015-04-22 23:44:58 UTC

Permalink

...

Post by Stephen Sprunk

Post by James Kuyper
than tracking down causes. You're like a doctor who sees a patient
coughing up blood, and concentrates his attention on cleaning up the
blood. A proper doctor need to figure out why the patient is bleeding
internally before the doctor can put a stop to it, otherwise the
bleeding will just continue until the patient dies as a result.

Figuring out the cause (and treating it) takes time, and if you do
nothing for the symptoms in the meantime, the patient may die anyway.

True - symptoms do need to be treated - but that should generally not be
the only thing you do about them.

At a minimum, you should seek a diagnosis. The diagnosis might indicate
that there is no need for anything more than symptomatic relief, or it
might indicated that there is no known cure, in which case symptomatic
relief is the best you can do - but it could also identify a treatment
plan that should be put into effect as soon as possible. Until you at
least attempt diagnosis, you can't be sure which of those cases apply.

Post by Stephen Sprunk
If no cure exists for the particular disease, then all you can do is
manage the symptoms.

Granted - but seldom completely relevant to computer bugs. Almost all
computer bugs are solvable, once you identify them sufficiently well -
the hard part is identifying them.

Stephen Sprunk

2015-04-23 16:53:55 UTC

Permalink

Post by James Kuyper

Post by Stephen Sprunk

Post by James Kuyper
You're like a doctor who sees a patient coughing up blood, and
concentrates his attention on cleaning up the blood. A proper
doctor need to figure out why the patient is bleeding internally
before the doctor can put a stop to it, otherwise the bleeding
will just continue until the patient dies as a result.

Figuring out the cause (and treating it) takes time, and if you do
nothing for the symptoms in the meantime, the patient may die
anyway.

True - symptoms do need to be treated - but that should generally not
be the only thing you do about them.
At a minimum, you should seek a diagnosis. The diagnosis might
indicate that there is no need for anything more than symptomatic
relief, or it might indicated that there is no known cure, in which
case symptomatic relief is the best you can do - but it could also
identify a treatment plan that should be put into effect as soon as
possible. Until you at least attempt diagnosis, you can't be sure
which of those cases apply.

Of course; that's why I said "in the meantime".

Post by James Kuyper

Post by Stephen Sprunk
If no cure exists for the particular disease, then all you can do
is manage the symptoms.

Granted - but seldom completely relevant to computer bugs. Almost
all computer bugs are solvable, once you identify them sufficiently
well - the hard part is identifying them.

Indeed, and that was the point behind the following paragraph that
discussed unreproducible bugs.

My experience is that if we can reproduce a bug, we can almost certainly
fix it--and usually with very little effort unless it requires a design
change. However, if we can't reproduce a bug, it's much more likely we
can't fix it--and even if we try, we can't be sure we actually did so.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Richard Heathfield

2015-04-23 17:59:04 UTC

Permalink

On 23/04/15 17:53, Stephen Sprunk wrote:

<snip>

Post by Stephen Sprunk
My experience is that if we can reproduce a bug, we can almost certainly
fix it--and usually with very little effort unless it requires a design
change. However, if we can't reproduce a bug, it's much more likely we
can't fix it--and even if we try, we can't be sure we actually did so.

Unreproducible bugs /tend/ to be pointer- or array-related. That is not,
of course, universally true, but it can be (forgive me) a pointer in the
right direction.

If I can't reproduce a bug, I'll try to do the next best thing, which is
to look for "clever" code. The chances are good that it isn't as clever
as it likes to make out. It is, at least, a starting point.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

Barry Schwarz

2015-04-22 22:17:04 UTC

Permalink

As has been mentioned, this fixes nothing and serves only to
camouflage the problem.

But on the rare occasion when this is the "proper" approach, what is
with those useless casts?

Post by jeff
However, as these null conditions only show up occasionally, there always
seem to be more bugs left to find.
My question is: is there a way to deal with all of these problems in one
swoop, by ignoring all code that dereferences a null pointer, i.e. null
dereference becomes a NO-OP instead of triggering an exception. ?

The way to deal with the problems is to perform a proper evaluation of
the code and correct the design/coding errors that allow the situation
to occur in the first place.

--
Remove del for email

James Dow Allen

2015-04-23 04:11:06 UTC

Permalink

Post by Barry Schwarz
As has been mentioned, this fixes nothing and serves only to
camouflage the problem.

Yes.

[true anecdote] Twenty-six years ago I was a consultant to Sun
Microsystems charged with investigating I/O performance problems. A new
SCSI feature allowed multiple drives to interleave data on the same
cable, but customers complained that they weren't getting the expected
performance boost. Sure enough, I saw that when attempting concurrent
accesses, a few would occur and then the system would drop down to one-
at-a-time scheduling.

I traced the problem to a line in the SCSI device driver:
if (ctrlp != NULL)
start_whatever(ctrlp);
I didn't know why ctrlp had gone NULL in the hugely complicated driver
but I set ctrlp to point to the controller structure, called
start_whatever(), and got the expected performance boost.

I left the details of the fix (why was ctrlp null anyway???) to one of
the several guys on the SCSI device driver team. (I don't recall his
name; I think he was a young guy with blong hair).

James Dow Allen

Robert Wessel

2015-04-22 22:19:14 UTC

Permalink

Short answer: no.

Longer answer: that's would not actually be fixing the problem, just
patching around the symptom. Either the caller is broken, in that
it's passing a null pointer to a a routine that requires a valid
pointer, or the routine is broken in it's handling of a null pointer
as an input (and the simplistic fix you illustrated above may or may
not actually fix the problem, and quite possibly may introduce further
problems), or the interface is underspecified when it comes to
handling null pointers.

IOW, you have bugs in your program, the correct solution is to fix
them, rather than attempting to patch the symptoms.

Ugly answer: you might be able to write highly platform specific code
that intercepts the abend, attempts to patch things up, and resumes
the code. That's been attempted many times, the usual end results are
terrible (after a few months or years you discover that while you have
been blissfully free of abends, your database is now hopelessly
corrupted, turning a nuisance into an existential crisis).

Ben Bacarisse

2015-04-22 22:46:45 UTC

Permalink

Post by jeff
I've got a large C codebase that frequently triggers bugs where pointers
that shouldn't be null, somehow end up getting nulled.
It's easy enough to fix up each individual bug when it gets reported,
if
((void*) ptr != (void*)NULL)

if (ptr != NULL) is enough. You don't need the casts. In fact,
if (ptr) is enough, though some consider that too terse.

Post by jeff
{
// existing code that triggered a crash
}
However, as these null conditions only show up occasionally, there always
seem to be more bugs left to find.

This sounds very odd. You aren't fixing bugs, you are patching up the
effect of bugs. Putting sticking plasters on a gaping wound comes to
mind.

Post by jeff
My question is: is there a way to deal with all of these problems in one
swoop, by ignoring all code that dereferences a null pointer, i.e. null
dereference becomes a NO-OP instead of triggering an exception. ?

I don't think should not be considering that sort of thing at all.
There is almost certainly some other underlying problem that needs to be
fixed. The pointers should not be null, you say, and you don't know why
they become null, so that's where the effort needs to go -- finding out
why, and stopping it.

--
Ben.

Stefan Ram

2015-04-23 00:22:31 UTC

Permalink

Post by jeff
My question is: is there a way to deal with all of these problems in one
swoop, by ignoring all code that dereferences a null pointer, i.e. null
dereference becomes a NO-OP instead of triggering an exception. ?

Maybe, dereferencing 0 might generate a SIGSEGV signal that
can be intercepted using means of <signal.h>?

Or, you can modify the compiler to generate code for the
unary * operator that will have the intended effect.

There might be old operating systems, like old SunOS' or
possibly BrandZ systems, that might possibly allow
dereferencing a null pointer (not generating a run-time
error). Maybe you can run your code under such an OS?

Nobody

2015-04-23 02:14:13 UTC

Permalink

Post by jeff
My question is: is there a way to deal with all of these problems in one
swoop, by ignoring all code that dereferences a null pointer, i.e. null
dereference becomes a NO-OP instead of triggering an exception. ?

In a sense, existing mechanisms already ignore "all code that dereferences
a null pointer", i.e. they terminate the process, resulting in the code
(i.e. the remainder of the program) being ignored. Maybe you meant to
suggest that it should ignore some smaller portion of the code? In which
case: which portion? How should the implementation determine that?

As for treating null dereference as a no-op: how can it? Pointer
dereference yields a value; what value should dereferencing a null pointer
yield? Just use whatever value happened to already be in the memory (or
register)? That's likely to just result in cascading failures; if the
value is itself a pointer, offset or index (or is used in the calculation
of such), you're going to end up accessing random memory locations.

And simply testing for a null pointer and conditionalising code which
depends upon the value obtained by dereferencing is often a workaround
rather than a fix. If null isn't a legitimate value, any fix would deal
with the reasons why the pointer was null, not the consequences.

James Kuyper

2015-04-23 16:32:04 UTC

Permalink

Post by Nobody

Post by jeff
My question is: is there a way to deal with all of these problems in one
swoop, by ignoring all code that dereferences a null pointer, i.e. null
dereference becomes a NO-OP instead of triggering an exception. ?

He seemed quite specific to me: "null dereference becomes a NO-OP".
That's a very small portion of the code, often just a single instruction.

...

Post by Nobody
As for treating null dereference as a no-op: how can it? ...
... Just use whatever value happened to already be in the memory (or
register)? That's likely to just result in cascading failures; if the
value is itself a pointer, offset or index (or is used in the calculation
of such), you're going to end up accessing random memory locations.

If that bothered him (as it should) he wouldn't be asking this question.