Post by David BrownPost by firfloat takes less space and when you keep arrays of floats for sure float
is better (less spase and uses less memory bandwidth so i guess floats
can be as twice faster in some aspects)
Certainly if you have a lot of them, then the memory bandwidth and cache
it rate can make floats faster than doubles.
Post by firbut when you do calculations on local variables not floats do the
double is slower?
I assume that for the calculations in question, the accuracy and range
of float is enough - otherwise the answer is obviously use doubles.
This is going to depend on the cpu, the type of instructions, the source
code in question, the compiler and the options. So there is no single
easy answer.
You can, as Bonita suggested, look up instruction timing information at
agner.org for the cpu you are using (assuming it's an x86 device) to get
some idea of any fundamental differences in timings. Usually for modern
"big" processors, basic operations such as addition and multiplication
are single cycle or faster (i.e., multiple instructions can be done in
parallel) for float and double. But division, square root, and other
more complex operations can take a lot longer with doubles.
Next, consider if you can be using vector or SIMD operations. On some
devices, you can do that with floats but not doubles - and even if you
can use doubles, you can usually run floats at twice the rate.
In the source code, remember it is very easy to accidentally promote to
double when writing in C. If you want to stick to floats, make sure you
don't use double-precision constants - a missing "f" suffix can change a
whole expression into double calculations. Remember that it takes time
to convert between float and double.
Then look at your compiler flags - these can make a big difference to
the speed of floating point code. I'm giving gcc flags, because those
are the ones I know - if you are using another compiler, look at the
details of its flags.
Obviously you want optimisation enabled if speed is relevant - -O2 is a
good start. Make sure you are optimising for the cpu(s) you are using -
"-march=native" is good for local programs, but you will want something
more specific if the binary needs to run on a variety of machines. The
closer you are to the exact cpu model, the better the code scheduling
and instruction choice can be.
Look closely at "-ffast-math" in the gcc manual. If that is suitable
for your code (and it often is), it can make a huge difference to
floating point intensive code. If it is unsuitable because you have
infinities, or need deterministic control of things like associativity,
it will make your results wrong.
"-Wdouble-promotion" can be helpful to spot accidental use of doubles in
what you think is a float expression. "-Wfloat-equal" is a good idea,
especially if you are mixing floats and doubles. "-Wfloat-conversion"
will warn about implicit conversions from doubles to floats (or to
integers).
the code that seem to speeded up a bit when turning float to double is
union Color
{
unsigned u;
struct { unsigned char b,g,r,a;};
};
inline float distance2d_(float x1, float y1, float x2, float y2)
{
return sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}
inline unsigned GetPixelUnsafe_(int x, int y)
{
return frame_bitmap[y*frame_size_x+x];
}
inline void SetPixelUnsafe_(int x, int y, unsigned color)
{
frame_bitmap[y*frame_size_x+x]=color;
}
void DrawPoint(int i)
{
// if(!point[i].enabled) return;
int xq = point[i].x;
int yq = point[i].y;
Color c;
Color bc;
if(d_toggler)
{
// DrawCircle(xq,yq,point[i].radius,0xffffff);
FillCircle(xq,yq,point[i].radius,point[i].c.u);
return;
}
float R = point[i].radius*5;
int y_start = max(0, yq-R);
int y_end = min(frame_size_y, yq+R);
int x_start = max(0, xq-R);
int x_end = min(frame_size_x, xq+R);
for(int y = y_start; y<y_end; y++)
{
for(int x = x_start; x<x_end; x++)
{
//fere below was float ->
double p = (R - distance2d_(x,y,point[i].x,point[i].y));
if(!i_toggler)
{
if(p<0.4*R) continue;
}
else
if(p<0) continue;
p/=R;
bc.u = GetPixelUnsafe_(x,y);
int r = bc.r + (point[i].c.r)* p*p*p;
int g = bc.g + (point[i].c.g)* p*p*p;
int b = bc.b + (point[i].c.b)* p*p*p;
if(!r_toggler)
{
if(r>255) r = 255;
if(g>255) g = 255;
if(b>255) b = 255;
}
c.r = r;
c.g = g;
c.b = b;
SetPixelUnsafe_(x,y,c.u);
}
}
}
this just draws something like little light that darkens as 1/(r*r*r)
and is able to add n-lights in place to mix colors end eventually
"overlight" (so this is kinda blending)
its very time consuming liek draving 100 of them (rhen r is 9) was
taking 35 ms on old machine afair)