Discussion:
is double slower?
(too old to reply)
fir
2024-11-04 07:53:00 UTC
Permalink
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)

but when you do calculations on local variables not floats do the double
is slower?
Chris M. Thomasson
2024-11-04 09:27:01 UTC
Permalink
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the double
is slower?
Ask the GPU.
fir
2024-11-04 14:43:30 UTC
Permalink
Post by Chris M. Thomasson
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the
double is slower?
Ask the GPU.
why? as tu cpu im not sure as on older cpus the calculations was anyway
made on double hardware (?, im not so sure) even if you passed float
to function im not sure if on assembly level yu not passed double

then after sse afair you got scalar code for floats and doubles
bbut simply i realized i dont know if double calculation on local
variables (not arrays) are in fakt anyway notable slower
fir
2024-11-04 18:23:21 UTC
Permalink
Post by fir
Post by Chris M. Thomasson
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the
double is slower?
Ask the GPU.
why? as tu cpu im not sure as on older cpus the calculations was anyway
made on double hardware (?, im not so sure) even if you passed float
to function im not sure if on assembly level yu not passed double
then after sse afair you got scalar code for floats and doubles
bbut simply i realized i dont know if double calculation on local
variables (not arrays) are in fakt anyway notable slower
im writing some cpu intensive experiment (something liek alpha blending
images on cpu mostly) and interestingly i just turned float into double
in that routine and it speeded up (as far as i can see, as i dont have
tme for much tests , changing to double turnet 35 ,s per frame into 34
ms per frame
Chris M. Thomasson
2024-11-04 20:48:20 UTC
Permalink
Post by fir
Post by fir
Post by Chris M. Thomasson
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the
double is slower?
Ask the GPU.
why? as tu cpu im not sure as on older cpus the calculations was anyway
made on double hardware (?, im not so sure) even if you passed float
to function im not sure if on assembly level yu not passed  double
then after sse afair you got scalar code for floats and doubles
bbut simply i realized i dont know if double calculation on local
variables (not arrays) are in fakt anyway notable slower
im writing some cpu intensive experiment (something liek alpha blending
images on cpu mostly) and interestingly i just turned float into double
in that routine and it speeded up (as far as i can see, as i dont have
tme for much tests , changing to double turnet 35 ,s per frame into 34
ms per frame
Well, do you need double precision anyway?
Chris M. Thomasson
2024-11-04 20:53:30 UTC
Permalink
Post by fir
Post by Chris M. Thomasson
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the
double is slower?
Ask the GPU.
why? as tu cpu im not sure as on older cpus the calculations was anyway
made on double hardware (?, im not so sure) even if you passed float
to function im not sure if on assembly level yu not passed  double
In the realm of shaders float is the way. double is not always there.
Post by fir
then after sse afair you got scalar code for floats and doubles
bbut simply i realized i dont know if double calculation on local
variables (not arrays) are in fakt anyway notable slower
Bonita Montero
2024-11-04 15:54:21 UTC
Permalink
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
but when you do calculations on local variables not floats do the double
is slower?
Look at the instruction tables at agner.org.
David Brown
2024-11-05 08:14:15 UTC
Permalink
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache
it rate can make floats faster than doubles.
Post by fir
but when you do calculations on local variables not floats do the double
is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.


This is going to depend on the cpu, the type of instructions, the source
code in question, the compiler and the options. So there is no single
easy answer.

You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings. Usually for modern
"big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.

Next, consider if you can be using vector or SIMD operations. On some
devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.


In the source code, remember it is very easy to accidentally promote to
double when writing in C. If you want to stick to floats, make sure you
don't use double-precision constants - a missing "f" suffix can change a
whole expression into double calculations. Remember that it takes time
to convert between float and double.


Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.

Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start. Make sure you are optimising for the cpu(s) you are using -
"-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines. The
closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.

Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have
infinities, or need deterministic control of things like associativity,
it will make your results wrong.

"-Wdouble-promotion" can be helpful to spot accidental use of doubles in
what you think is a float expression. "-Wfloat-equal" is a good idea,
especially if you are mixing floats and doubles. "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
fir
2024-11-05 09:49:18 UTC
Permalink
Post by David Brown
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache
it rate can make floats faster than doubles.
Post by fir
but when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the source
code in question, the compiler and the options. So there is no single
easy answer.
You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings. Usually for modern
"big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations. On some
devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to
double when writing in C. If you want to stick to floats, make sure you
don't use double-precision constants - a missing "f" suffix can change a
whole expression into double calculations. Remember that it takes time
to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start. Make sure you are optimising for the cpu(s) you are using -
"-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines. The
closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have
infinities, or need deterministic control of things like associativity,
it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of doubles in
what you think is a float expression. "-Wfloat-equal" is a good idea,
especially if you are mixing floats and doubles. "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
the code that seem to speeded up a bit when turning float to double is

union Color
{
unsigned u;
struct { unsigned char b,g,r,a;};
};


inline float distance2d_(float x1, float y1, float x2, float y2)
{
return sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}

inline unsigned GetPixelUnsafe_(int x, int y)
{
return frame_bitmap[y*frame_size_x+x];
}
inline void SetPixelUnsafe_(int x, int y, unsigned color)
{
frame_bitmap[y*frame_size_x+x]=color;
}

void DrawPoint(int i)
{
// if(!point[i].enabled) return;

int xq = point[i].x;
int yq = point[i].y;

Color c;
Color bc;

if(d_toggler)
{
// DrawCircle(xq,yq,point[i].radius,0xffffff);
FillCircle(xq,yq,point[i].radius,point[i].c.u);

return;
}

float R = point[i].radius*5;

int y_start = max(0, yq-R);
int y_end = min(frame_size_y, yq+R);
int x_start = max(0, xq-R);
int x_end = min(frame_size_x, xq+R);

for(int y = y_start; y<y_end; y++)
{
for(int x = x_start; x<x_end; x++)
{
//fere below was float ->
double p = (R - distance2d_(x,y,point[i].x,point[i].y));


if(!i_toggler)
{
if(p<0.4*R) continue;
}
else
if(p<0) continue;

p/=R;

bc.u = GetPixelUnsafe_(x,y);
int r = bc.r + (point[i].c.r)* p*p*p;
int g = bc.g + (point[i].c.g)* p*p*p;
int b = bc.b + (point[i].c.b)* p*p*p;

if(!r_toggler)
{
if(r>255) r = 255;
if(g>255) g = 255;
if(b>255) b = 255;
}

c.r = r;
c.g = g;
c.b = b;

SetPixelUnsafe_(x,y,c.u);

}
}

}

this just draws something like little light that darkens as 1/(r*r*r)
and is able to add n-lights in place to mix colors end eventually
"overlight" (so this is kinda blending)

its very time consuming liek draving 100 of them (rhen r is 9) was
taking 35 ms on old machine afair)
fir
2024-11-05 09:51:19 UTC
Permalink
Post by fir
this just draws something like little light that darkens as 1/(r*r*r)
and is able to add n-lights in place to mix colors end eventually
"overlight" (so this is kinda blending)
its very time consuming liek draving 100 of them (rhen r is 9) was
taking 35 ms on old machine afair)
right now i cant do test (must work on some other problems) of it but my
previous faith like float is never slower possibly is being like under
question
fir
2024-11-05 10:03:10 UTC
Permalink
Post by fir
Post by David Brown
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache
it rate can make floats faster than doubles.
Post by fir
but when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the source
code in question, the compiler and the options. So there is no single
easy answer.
You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings. Usually for modern
"big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations. On some
devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to
double when writing in C. If you want to stick to floats, make sure you
don't use double-precision constants - a missing "f" suffix can change a
whole expression into double calculations. Remember that it takes time
to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start. Make sure you are optimising for the cpu(s) you are using -
"-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines. The
closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have
infinities, or need deterministic control of things like associativity,
it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of doubles in
what you think is a float expression. "-Wfloat-equal" is a good idea,
especially if you are mixing floats and doubles. "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
the code that seem to speeded up a bit when turning float to double is
union Color
{
unsigned u;
struct { unsigned char b,g,r,a;};
};
inline float distance2d_(float x1, float y1, float x2, float y2)
{
return sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}
inline unsigned GetPixelUnsafe_(int x, int y)
{
return frame_bitmap[y*frame_size_x+x];
}
inline void SetPixelUnsafe_(int x, int y, unsigned color)
{
frame_bitmap[y*frame_size_x+x]=color;
}
void DrawPoint(int i)
{
// if(!point[i].enabled) return;
int xq = point[i].x;
int yq = point[i].y;
Color c;
Color bc;
if(d_toggler)
{
// DrawCircle(xq,yq,point[i].radius,0xffffff);
FillCircle(xq,yq,point[i].radius,point[i].c.u);
return;
}
float R = point[i].radius*5;
int y_start = max(0, yq-R);
int y_end = min(frame_size_y, yq+R);
int x_start = max(0, xq-R);
int x_end = min(frame_size_x, xq+R);
for(int y = y_start; y<y_end; y++)
{
for(int x = x_start; x<x_end; x++)
{
//fere below was float ->
double p = (R - distance2d_(x,y,point[i].x,point[i].y));
if(!i_toggler)
{
if(p<0.4*R) continue;
}
else
if(p<0) continue;
p/=R;
bc.u = GetPixelUnsafe_(x,y);
int r = bc.r + (point[i].c.r)* p*p*p;
int g = bc.g + (point[i].c.g)* p*p*p;
int b = bc.b + (point[i].c.b)* p*p*p;
if(!r_toggler)
{
if(r>255) r = 255;
if(g>255) g = 255;
if(b>255) b = 255;
}
c.r = r;
c.g = g;
c.b = b;
SetPixelUnsafe_(x,y,c.u);
}
}
}
this just draws something like little light that darkens as 1/(r*r*r)
and is able to add n-lights in place to mix colors end eventually
"overlight" (so this is kinda blending)
its very time consuming liek draving 100 of them (rhen r is 9) was
taking 35 ms on old machine afair)
some can test it BTW


https://drive.google.com/file/d/1-Obb6F19h5yfCbCETP4-VFoV3XYGpRsN/view?usp=sharing

its for windows but worx under wine afair /and on linux wirtual machine
on windows also (afair, i dont know as i got only windows)
fir
2024-11-06 19:20:41 UTC
Permalink
Post by fir
some can test it BTW
https://drive.google.com/file/d/1-Obb6F19h5yfCbCETP4-VFoV3XYGpRsN/view?usp=sharing
its for windows but worx under wine afair /and on linux wirtual machine
on windows also (afair, i dont know as i got only windows)
you may also see it on youtube if afraid to runn app (though app is much
better)


Chris M. Thomasson
2024-11-06 20:55:42 UTC
Permalink
Post by fir
Post by fir
some can test it BTW
https://drive.google.com/file/d/1-Obb6F19h5yfCbCETP4-VFoV3XYGpRsN/
view?usp=sharing
its for windows but worx under wine afair /and on linux wirtual machine
on windows also (afair, i dont know as i got only windows)
you may also see it on youtube if afraid to runn app (though app is much
better)
http://youtu.be/7_Fodb7ivZY
Pretty nice! :^)
David Brown
2024-11-05 10:25:15 UTC
Permalink
Post by fir
Post by David Brown
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache
it rate can make floats faster than doubles.
Post by fir
but when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the source
code in question, the compiler and the options.  So there is no single
easy answer.
You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings.  Usually for modern
"big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double.  But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations.  On some
devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to
double when writing in C.  If you want to stick to floats, make sure you
don't use double-precision constants - a missing "f" suffix can change a
whole expression into double calculations.  Remember that it takes time
to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code.  I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start.  Make sure you are optimising for the cpu(s) you are using -
"-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines.  The
closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual.  If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code.  If it is unsuitable because you have
infinities, or need deterministic control of things like associativity,
it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of doubles in
what you think is a float expression.  "-Wfloat-equal" is a good idea,
especially if you are mixing floats and doubles.  "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
the code that seem to speeded up a bit when turning float to double is
I've tried to snip the bits that are important here.
Post by fir
inline float distance2d_(float x1, float y1, float x2, float y2)
 {
  return sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
 }
What happens here depends on what #include files you use. If you have
#include <math.h>, then "sqrt" is defined with doubles. So the
sum-of-squares expression is calculated using floats. Then this sum is
converted to a double (taking an extra instruction or two) before
calling double-precision sqrt. Then it is converting that result back
to float to return it.

If you have "#include <tgmath.h>", then "sqrt" here will be done as
float sqrtf, rather than double. But the library version of sqrtf()
might actually call sqrt (double). If you want to be sure, be explicit
with sqrtf().

And on many platforms, sqrt (float or double) uses a library function
for full IEEE compatibility. With "-ffast-math", you are telling the
compiler you promise that the operand for "sqrt" will be "nice", and it
can use a single hardware sqrt instruction. This will likely be a lot
faster, especially if the float version is used. (Disclaimer - I
haven't looked at this on modern x86 targets. Check yourself - I
recommend putting your code into godbolt.org and examining the assembly.)


In the code that uses this function, you are starting with integer types
that need to be converted to float to pass to the distance function, and
the result of the call is used in a float expression before being
converted to double.

In short, it is a complete mess of conversions. And unless you are
using something like gcc's "-ffast-math" to say "don't worry about the
minor details of IEEE, optimise akin to integer arithmetic", then the
compiler has to generate all these back-and-forth conversions.


Being consistent in your types is going to improve things, whether you
use floats or doubles. You might even be better off using integer
arithmetic in some points.
Post by fir
     //fere below was float ->
    double p = (R - distance2d_(x,y,point[i].x,point[i].y));
fir
2024-11-05 10:42:38 UTC
Permalink
Post by David Brown
Post by fir
Post by David Brown
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache
it rate can make floats faster than doubles.
Post by fir
but when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the source
code in question, the compiler and the options. So there is no single
easy answer.
You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings. Usually for modern
"big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations. On some
devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to
double when writing in C. If you want to stick to floats, make sure you
don't use double-precision constants - a missing "f" suffix can change a
whole expression into double calculations. Remember that it takes time
to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start. Make sure you are optimising for the cpu(s) you are using -
"-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines. The
closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have
infinities, or need deterministic control of things like associativity,
it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of doubles in
what you think is a float expression. "-Wfloat-equal" is a good idea,
especially if you are mixing floats and doubles. "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
the code that seem to speeded up a bit when turning float to double is
I've tried to snip the bits that are important here.
Post by fir
inline float distance2d_(float x1, float y1, float x2, float y2)
{
return sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}
What happens here depends on what #include files you use. If you have
#include <math.h>, then "sqrt" is defined with doubles. So the
sum-of-squares expression is calculated using floats. Then this sum is
converted to a double (taking an extra instruction or two) before
calling double-precision sqrt. Then it is converting that result back
to float to return it.
If you have "#include <tgmath.h>", then "sqrt" here will be done as
float sqrtf, rather than double. But the library version of sqrtf()
might actually call sqrt (double). If you want to be sure, be explicit
with sqrtf().
And on many platforms, sqrt (float or double) uses a library function
for full IEEE compatibility. With "-ffast-math", you are telling the
compiler you promise that the operand for "sqrt" will be "nice", and it
can use a single hardware sqrt instruction. This will likely be a lot
faster, especially if the float version is used. (Disclaimer - I
haven't looked at this on modern x86 targets. Check yourself - I
recommend putting your code into godbolt.org and examining the assembly.)
In the code that uses this function, you are starting with integer types
that need to be converted to float to pass to the distance function, and
the result of the call is used in a float expression before being
converted to double.
In short, it is a complete mess of conversions. And unless you are
using something like gcc's "-ffast-math" to say "don't worry about the
minor details of IEEE, optimise akin to integer arithmetic", then the
compiler has to generate all these back-and-forth conversions.
Being consistent in your types is going to improve things, whether you
use floats or doubles. You might even be better off using integer
arithmetic in some points.
Post by fir
//fere below was float ->
double p = (R - distance2d_(x,y,point[i].x,point[i].y));
well that interesting..especially i was unaware of this sqrtf i will see
a bit later

as to -fast-math i dont noticed the difference though i was not testing
it besides simple sight.. i used it back years then but later i disabled
it as i get some bug in one code which was afair caused by that
(im not sure though, today i rarely code at all so im not to much fresh
to various test)

in fact i could more hardy optimise it just by building table with that
fading circle of size 45x45 and do a look up there (back then i was
doing a big doze of thsi level optimisations, but after all i know it is
to do on final stage of app as it generally makes harder to work on it
at live and test various changes, but as final stage its generally worth
if something runs 30-50% faster)
fir
2024-11-05 13:42:22 UTC
Permalink
Post by fir
Post by David Brown
Post by fir
Post by David Brown
Post by fir
float takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache
it rate can make floats faster than doubles.
Post by fir
but when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the source
code in question, the compiler and the options. So there is no single
easy answer.
You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings. Usually for modern
"big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations. On some
devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to
double when writing in C. If you want to stick to floats, make sure you
don't use double-precision constants - a missing "f" suffix can change a
whole expression into double calculations. Remember that it takes time
to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start. Make sure you are optimising for the cpu(s) you are using -
"-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines. The
closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have
infinities, or need deterministic control of things like associativity,
it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of doubles in
what you think is a float expression. "-Wfloat-equal" is a good idea,
especially if you are mixing floats and doubles. "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
the code that seem to speeded up a bit when turning float to double is
I've tried to snip the bits that are important here.
Post by fir
inline float distance2d_(float x1, float y1, float x2, float y2)
{
return sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}
What happens here depends on what #include files you use. If you have
#include <math.h>, then "sqrt" is defined with doubles. So the
sum-of-squares expression is calculated using floats. Then this sum is
converted to a double (taking an extra instruction or two) before
calling double-precision sqrt. Then it is converting that result back
to float to return it.
If you have "#include <tgmath.h>", then "sqrt" here will be done as
float sqrtf, rather than double. But the library version of sqrtf()
might actually call sqrt (double). If you want to be sure, be explicit
with sqrtf().
And on many platforms, sqrt (float or double) uses a library function
for full IEEE compatibility. With "-ffast-math", you are telling the
compiler you promise that the operand for "sqrt" will be "nice", and it
can use a single hardware sqrt instruction. This will likely be a lot
faster, especially if the float version is used. (Disclaimer - I
haven't looked at this on modern x86 targets. Check yourself - I
recommend putting your code into godbolt.org and examining the assembly.)
In the code that uses this function, you are starting with integer types
that need to be converted to float to pass to the distance function, and
the result of the call is used in a float expression before being
converted to double.
In short, it is a complete mess of conversions. And unless you are
using something like gcc's "-ffast-math" to say "don't worry about the
minor details of IEEE, optimise akin to integer arithmetic", then the
compiler has to generate all these back-and-forth conversions.
Being consistent in your types is going to improve things, whether you
use floats or doubles. You might even be better off using integer
arithmetic in some points.
Post by fir
//fere below was float ->
double p = (R - distance2d_(x,y,point[i].x,point[i].y));
well that interesting..especially i was unaware of this sqrtf i will see
a bit later
as to -fast-math i dont noticed the difference though i was not testing
it besides simple sight.. i used it back years then but later i disabled
it as i get some bug in one code which was afair caused by that
(im not sure though, today i rarely code at all so im not to much fresh
to various test)
in fact i could more hardy optimise it just by building table with that
fading circle of size 45x45 and do a look up there (back then i was
doing a big doze of thsi level optimisations, but after all i know it is
to do on final stage of app as it generally makes harder to work on it
at live and test various changes, but as final stage its generally worth
if something runs 30-50% faster)
ok i tested it though not extensibly (depends on how many those lights
overlap itself etc)

the version i got was 17 ms both full doubles or full floats (with
sqrtf) are 15 ms so its notable speedup, thsi mixing seem to was
wrong
as to what is faster float than doubles its both 15 ms though sometimes
float version blinks on 16 and double blinks sometimes for 14 ..so maybe
double is slightly faster

may depend on which machine run probably as those times really wary on
different cpus afaik
Bonita Montero
2024-12-09 13:50:14 UTC
Permalink
Post by David Brown
You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings.  Usually for modern
"big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double.  But division, square root, and other
more complex operations can take a lot longer with doubles.
Agner.org says that a SQRTS/PD is 21 clock cycles and a SQRTSP/S is 15
clock cycles; that's not much difference.
Bonita Montero
2024-12-09 14:10:15 UTC
Permalink
Post by Bonita Montero
Agner.org says that a SQRTS/PD is 21 clock cycles and a
SQRTSP/S is 15 clock cycles; that's not much difference.
And DIVS/PD is only zero to two cycles slwoer zhan DIVS/PS.

Loading...