Reply

PermalinkRaw Message

*Post by bitrex**Post by Lynn McGuire*"Sometimes Floating Point Math is Perfect"

https://randomascii.wordpress.com/2017/06/19/sometimes-floating-point-math-is-perfect/

Interesting. We moved to 64 bit doubles a couple of decades ago and

have never regretted it.

Lynn

"If the two constants being added had been exact then there would only

have been one rounding in the calculation and the result would have

matched the literal on the right-hand side."

What does "exact" mean in this context?

In this context it means... er... exact -- without any error.

*Post by bitrex*How is writing 98432341293.375 + 0.000244140625 more "exact" than 0.2

+ 0.3?

Neither the quoted paragraph not the blog post use the term "more

exact". What's more the blog post does not compare 98432341293.375 +

0.000244140625 with 0.2 + 0.3, but with 0.1 + 0.3.

98432341293.375 is binary 1011011101011000001100101010100101101.011.

This can be represented exactly (without error) in a C double and, if

the C implementation conforms to the IEEE floating-point

recommendations, it will be exactly represented. Likewise,

0.000244140625 is exactly .000000000001 and the rules of IEEE arithmetic

say that the sum must be rounded to the nearest (binary) digit. In

fact, the sum can be represented exactly in a C double so "rounding to

the nearest binary digit" means, in the case, giving the exact answer.

With 0.1 + 0.2 there are three places where accuracy is lost. 0.1 can

not be represented exactly in a binary double and neither can 0.2. Both

will be represented by the nearest possible floating-point number, but

neither is exact. Finally, the sum of those two closest-but-not-quite

numbers can not be exactly represented either, giving a third loss of

accuracy.

I leave your example, 0.2 + 0.3 for you to analyse yourself.

--

Ben.