In article
<0c2cd374-bfef-468f-ba36-de61bd9b408f@[EMAIL PROTECTED]
>,
MacRules <dodeca001@[EMAIL PROTECTED]
> wrote:
> <snip>
> Sorry - stupid mistake. I took a look at the ASM code - the compiler
> optimizer was smart enough to figure out that the variables inside
> these loops aren't used for anything, and deleted them. However, the
> optimizer wasn't smart enough to figure out that the resulting empty
> FOR loops aren't doing anything, either, so it left these in.
>
> I added some printf() calls to be bottom of the function, making all
> of the variables above 'relevant' and thus forcing the optimizer to
> leave the FOR loop contents intact:
>
> printf("a_f b_f c_f d_f e_f f_f = %f %f %f %f %f %f
> \n",a_f,b_f,c_f,d_f,e_f,f_f);
> printf("a_d b_d c_d d_d e_d f_d = %lf %lf %lf %lf %lf %lf
> \n",a_d,b_d,c_d,d_d,e_d,f_d);
>
> I then verified that the compiler is now doing the requested
> mathematical operations with floats and doubles.
>
> The new results:
>
> Number of iterations? 50000000
>
> Duration float, double = 643 361 ms
>
> I don't know why double would be twice as fast as float
> [...]
> In any case, my original conclusion still holds: I should upgrade
> to double.
Based on what you told here, I would not conclude that. Your logic still
may be flawed. I would guess that, if you changed your code to read and
write lots of different memory locations, you could reverse the timing
difference.
For example,
float * a_fs = new float[1000000];
float * b_fs = new float[1000000];
float * c_fs = new float[1000000];
...
for( int i = 0; i < 1000000; ++i) {
c_fs[i] = a_fs[i] * b_fs[ i];
}
Reason would be that, for this, the number of bytes that have to be read
and written will be the limiting factors in the speed you get.
I haven't checked, but there also may be effects of the placement of
your variables in cache lines.
Reinder


|