> =A0 =A0float * a_fs =3D new float[1000000];
> =A0 =A0float * b_fs =3D new float[1000000];
> =A0 =A0float * c_fs =3D new float[1000000];
> =A0 =A0...
> =A0 =A0for( int i =3D 0; i < 1000000; ++i) {
> =A0 =A0 =A0c_fs[i] =3D a_fs[i] * b_fs[ i];
> =A0 =A0}
>
> Reason would be that, for this, the number of bytes that have to be read
> and written will be the limiting factors in the speed you get.
>
Thanks for the feedback. This is a valid point. The code I inherited
is heavily memory intensive. I have plans to make it less so, which is
why I concentrated on the efficiency of arithmetic operations first.
For fun, I added two more tests, focusing only on float vs. double
memory I/O:
#define ASG_FD_TEST_ARRAY_LENGTH 2000000 // second
attempt
//#define ASG_FD_TEST_ARRAY_LENGTH 32768 // first
attempt
test_f_e =3D (float *)malloc(ASG_FD_TEST_ARRAY_LENGTH*sizeof(float));
m1 =3D 0;
m2 =3D ASG_FD_TEST_ARRAY_LENGTH - 1;
for (n=3D0; n<num_iter; n++) {
test_f_e[m1] =3D test_f_e[m2];
m1++;
if (m1 =3D=3D ASG_FD_TEST_ARRAY_LENGTH)
m1 =3D 0;
m2--;
if (m2 < 0)
m2 =3D ASG_FD_TEST_ARRAY_LENGTH - 1; }
free(test_f_e);
ASG_timing_stamp_advance(
&timer);
test_d_e =3D (double *)malloc(ASG_FD_TEST_ARRAY_LENGTH*sizeof(double));
m1 =3D 0;
m2 =3D ASG_FD_TEST_ARRAY_LENGTH - 1;
for (n=3D0; n<num_iter; n++) {
test_d_e[m1] =3D test_d_e[m2];
m1++;
if (m1 =3D=3D ASG_FD_TEST_ARRAY_LENGTH)
m1 =3D 0;
m2--;
if (m2 < 0)
m2 =3D ASG_FD_TEST_ARRAY_LENGTH - 1; }
free(test_d_e);
ASG_timing_stamp_advance(
&timer);
Result #1: when the float and double test array lengths are 32768
MEM I/O: duration float, double =3D 117 116 ms
Result #2: when the float and double test array lengths are 2000000
MEM I/O: duration float, double =3D 106 225 ms
Discussion: for result #1, I suspect that an internal cache in the
processor was intercepting my external memory read/write attempts,
leading to no performance difference between floats and doubles. For
result #2, the increased array length defeated the internal cache,
resulting in many external memory read/writes, resulting in longer
latency for the doubles - as you predicted.
Whether this matters for my particular application is unclear. I'm
still leaning toward an upgrade to doubles.
Thanks,
Steve


|