Thanks, David. I was kind of guessing as much. I'll bet it would take quite a bit
more than just running over relatively small loops to really see pyopencl shine.
By the way, I noticed the benchmark-all.py example has a couple of nested for loops in the
non pyopencl section. The same exact array is generated with just one of them. Is there
some logic behind the nested loops? It makes the example 10 times slower and is a bit
misleading, I think. When I removed one of them the time for no opencl was about twice as
fast as when the context was my 2 core processor, which makes sense.
Craig
________________________________________
From: David Garcia [david.rigel(a)gmail.com]
Sent: Thursday, September 17, 2009 3:27 PM
To: Swank, Craig
Cc: pyopencl(a)tiker.net
Subject: Re: [PyOpenCL] benchmark
Craig,
Even if you ran it on the GPU you could get worse results than with numpy. Performing
computations on the GPU requires quite a bit of orchestration, such as copying the data to
video memory and reading it back. You want to make your workload as close as this as
possible:
1. Load data into GPU.
2. Perform _lengthy_ computation on GPU.
3. Take the output from 2 and do some more heavy computation in the GPU. Repeat as
necessary.
4. Read back results.
Even if you are executing OpenCL on a CPU it will still have some overhead, so you want
your kernels to be significantly expensive to compute. Otherwise you won't see much
benefit.
Cheers,
David
2009/9/17 Craig Swank <craig_swank@nrel.gov<mailto:craig_swank@nrel.gov>>
Oops, It looks like, upon further review, that those opencl results below are just for my
cpu. I don't have gpu results and probably don't have a gpu. I'm going to
try on another computer and I'll update this post.
Craig
On Sep 17, 2009, at 12:59 PM, Craig Swank wrote:
Hello,
I am just looking at opencl for the first time today. Looks pretty
neat. I added the following lines to benchmark-all.py:
c_result2 = numpy.empty_like(a)
time1 = time()
c_result2 = a + b
c_result2 = c_result2 * (a + b)
c_result2 = c_result2 * (a / 2.0)
time2 = time()
print "Execution time of test without OpenCL, but with numpy: ", time2
- time1, "s"
To do the same calculations the way numpy was designed to do, and got
the following results (edited for readability):
Execution time of test without OpenCL: 23.8333249092 s
Execution time of test without OpenCL, but with numpy:
7.41481781006e-05 s
Execution time of test: 0.014881 s
The numpy way is quite a bit faster. My question is, is there a use
case where the use of opencl would overtake numpy for these types of
calculations? Or maybe I just have a sucky GPU? I don't know.
Craig
_______________________________________________
PyOpenCL mailing list
PyOpenCL@tiker.net<mailto:PyOpenCL@tiker.net>
http://tiker.net/mailman/listinfo/pyopencl_tiker.net
_______________________________________________
PyOpenCL mailing list
PyOpenCL@tiker.net<mailto:PyOpenCL@tiker.net>
http://tiker.net/mailman/listinfo/pyopencl_tiker.net