Hello all,
I finally bit the bullet and got radix working in PyOpenCL :)
It's also improved over the SDK example because it does keys and values,
mostly thanks to my advisor.
Additionally this sort will handle any size array as long as it is a power
of 2. The shipped example does not allow for arrays smaller than 32768, but
I've hooked up their naive scan to allow all smaller arrays.
https://github.com/enjalot/adventures_in_opencl/tree/master/experiments/rad…
all you really need are radix.py, RadixSort.cl and Scan_b.cl
some simple tests are at the bottom of radix.py
I hammered this out because I need it for a project, it's not all that clean
and I didn't add support for sorting on keys only (altho it wouldn't take
much to add that, and I intend to at a later time when I need the
functionality). Hopefully this helps someone else out there. I'll also be
porting it using my own OpenCL C++ wrappers to include in my fluid
simulation library at some point.
I also began looking at AMD's radix from their SPH tutorial, but they use
local atomics which are not supported on my 9600M
--
Ian Johnson
http://enja.org
Hello,
Fwiw, I posted the outputs from the 'test_array.py' script at http://mypage.iu.edu/~heiland/pyopencl/
I realize the failures (and fixes) that occur here may be due to Andreas's lack of having an OSX platform to test on, so I don't intend for this post to be a nuisance. I'm simply curious what people are getting as results (from tests & examples) on various platforms and thought I'd offer up some from OSX.
Btw, if there is some guidance on debugging pyopencl, I'd certainly welcome it and try to contribute.
-Randy
Hello
I`ve made a little testscript to test the bilinear interpolation of
image2d_t-objects.
The script:
import pyopencl as cl
import numpy as np
import cv2 # OpenCV 2.3.1
Img = cv2.imread("Test.jpg") # read Image with width = 709px and height
= 472px
Img = cv2.cvtColor(Img, cv2.COLOR_BGR2GRAY) # convert to grayscale
print Img.shape # prints: (472L, 709L)
OutImg = np.empty(shape=Img.shape, dtype=np.uint8) # create Output-Image
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
dev_Img = cl.Image(ctx,
mf.READ_ONLY | mf.COPY_HOST_PTR,
cl.ImageFormat(cl.channel_order.R,
cl.channel_type.UNSIGNED_INT8),
hostbuf=Img)
dev_OutImg = cl.Image(ctx,
mf.WRITE_ONLY | mf.ALLOC_HOST_PTR,
cl.ImageFormat(cl.channel_order.R,
cl.channel_type.UNSIGNED_INT8),
shape=Img.shape)
prg = cl.Program(ctx, """
const sampler_t smp = CLK_NORMALIZED_COORDS_TRUE |
CLK_FILTER_LINEAR | CLK_ADDRESS_NONE;
__kernel void ImageCopy(__read_only image2d_t Img, __write_only
image2d_t Out)
{
const int2 dims = get_image_dim(Img);
const int2 Coords = (int2)(get_global_id(0), get_global_id(1));
const float2 NormCoords = (convert_float2(Coords) +
(float2)(0.5f) + (float2)(0.4f)) / convert_float2(dims); // = (x + 0.5 +
0.4)/h ; (y + 0.5 + 0.2)/w
uint4 Pixel = read_imageui(Img, smp, NormCoords);
write_imageui(Out, Coords, Pixel);
}
""").build()
prg.ImageCopy(queue, Img.shape, None, dev_Img, dev_OutImg)
cl.enqueue_read_image(queue, dev_OutImg, (0, 0), OutImg.shape,
OutImg).wait()
cv2.imwrite("Out.jpg", OutImg)
The problem: This is the input-image:
http://s7.directupload.net/file/d/2692/hkbb8pn2_jpg.htm
And when i add a small offset to the image-coords ( here 0.4f) the
output is something like this:
http://s1.directupload.net/file/d/2692/xuqzmtp4_jpg.htm
When i don't add this little offset or use CLK_FILTER_NEAREST then the
result is o.k.
The effect is proportional to the ratio of the image height and width
and the absolute value of the offset.
Also only the offset in image-width direction has a influence ( if i add
(float2)(0.4f,0.0f) the result is nearly o.k. - 4 diagonal pixel-bands
remains)
A quadratic image produces allways the correct output.
The script runs on a NVidia NVS3100M with image-support.
I`ve tried to swap the Image-shapes and Coords in all ways in and
outside the kernel - no good result.
Hope that somebody can guide me back to the right way
Thx Sylvan
Hi Andreas,
----- Mail original -----
"Andreas Kloeckner" <lists(a)informa.tiker.net>:
>
> On Sat, 29 Oct 2011 14:10:39 +0200 (CEST), Vincent Favre-Nicolin
> <vincent.favre-nicolin(a)cea.fr> wrote:
> > return enqueue_nd_range_kernel(queue, self, global_size,
> > local_size,
> > > global_offset, wait_for, g_times_l=g_times_l)
> > E MemoryError: clEnqueueNDRangeKernel failed: mem object
> > allocation failure
>
> The test_random error I saw as well, so that's ok. This latter one is
> also mystifying. What does it think it's allocating here? How much
> memory do you have in your machine? Any chance that you just
> physically ran out of memory?
I doubt it, though the graphic card is not too powerful (this part of the test failed on the "ATI Radeon HD 6630M"). The computer has 4GB of memory (and freshly rebooted), and the card has (according to wikipedia) 1GB of mem.
> Ah, I just had an idea: Can you print local_size right before this
> failure? If it's bigger than 32K, that might be the issue--although
> I'm
> not sure that'd be the case...
global_size,local_size: (2304,) (128,)
Not so big, so... But I've seen a few problems on the ATI card - sometimes I get into a state where opencl initialization becomes impossible, and I have to reboot. Not sure if a soft reset is possible. But that's a different issue.
Vincent
Hi Vincent,
On Sat, 29 Oct 2011 14:10:39 +0200 (CEST), Vincent Favre-Nicolin <vincent.favre-nicolin(a)cea.fr> wrote:
> return enqueue_nd_range_kernel(queue, self, global_size, local_size,
> > global_offset, wait_for, g_times_l=g_times_l)
> E MemoryError: clEnqueueNDRangeKernel failed: mem object allocation
> failure
The test_random error I saw as well, so that's ok. This latter one is
also mystifying. What does it think it's allocating here? How much
memory do you have in your machine? Any chance that you just physically
ran out of memory?
Ah, I just had an idea: Can you print local_size right before this
failure? If it's bigger than 32K, that might be the issue--although I'm
not sure that'd be the case...
Andreas
Hi Yosuke,
On Sun, 25 Sep 2011 21:22:04 +0900, onoue(a)likr-lab.com wrote:
> I found a Apple OpenCL bug on Mac OS X Lion. If device type is CPU,
> device.max_work_item_sizes[0] returns 1024. But, invalid work group
> size error is occurred when local_size[0] is greater than 128. And, I
> don't know why, ElementwiseKernel works only when local_size is None
> or (1, 1, 1). I have already reported this error to Apple.
>
> Incidentally, I got following error when I execute a script in Apple
> OpenCL with PyOpenCL source code getting from git repository.
>
> Traceback (most recent call last):
> File "montecarlo.py", line 27, in <module>
> kernel = create_kernel(context, D)
> File "montecarlo.py", line 17, in create_kernel
> arguments=arguments)
> File "/Users/likr/git/pyopencl/pyopencl/reduction.py", line 268, in __init__
> name=name+"_stage1", options=options, preamble=preamble)
> File "/Users/likr/git/pyopencl/pyopencl/reduction.py", line 234, in
> get_reduction_kernel
> name, preamble, device, max_group_size)
> File "/Users/likr/git/pyopencl/pyopencl/reduction.py", line 185, in
> get_reduction_source
> no_sync_size = min(get_dev_no_sync_size(dev) for dev in devices)
> File "/Users/likr/git/pyopencl/pyopencl/reduction.py", line 185, in <genexpr>
> no_sync_size = min(get_dev_no_sync_size(dev) for dev in devices)
> File "/Users/likr/git/pyopencl/pyopencl/reduction.py", line 175, in
> get_dev_no_sync_size
> result = get_simd_group_size(device)
> File "/Users/likr/git/pyopencl/pyopencl/characterize.py", line 271,
> in get_simd_group_size
> if dtype.itemsize == 1:
> NameError: global name 'dtype' is not defined
This should not be an issue any more with the current git version of
PyOpenCL, soon to be PyOpenCL 2011.2.
Thanks for the report, and apologies for the long delay.
Andreas
i am having trouble working out how to clear the kernel cache. the docs
suggest clear_context_caches() from pyopencl.tools, but it doesn't seem to be
there:
Python 3.2 (r32:88445, Jun 9 2011, 09:31:13)
[GCC 4.5.1 20101208 [gcc-4_5-branch revision 167585]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyopencl.tools import clear_context_caches
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name clear_context_caches
>>> from pyopencl import VERSION
>>> VERSION
(2011, 1, 2)
what should i be doing?
thanks,
andrew
AMD released Aparapi project that allows Java developers to use GPU and does it in a wonderful way.
First you don't right C code at all. It converts the java code at runtime to be able to run on the GPU. PyOpenCL does that as well via meta programming. check
Second it is smart enough to adapt itself from a single code whether to execute on a OpenCL device or not. You never write two separate codes. If an OpenCL device is detected it uses it, if not then uses java thread pool. Essentially taking all the headache away. It automatically covers all your bases with a single code base.
Can PyOpenCL adapt itself if OpenCL device isn't detected.
Hi Pedro,
On Mon, 3 Oct 2011 23:10:46 -0700, Pedro Marcal <pedrovmarcal(a)gmail.com> wrote:
> I installed the binary from the Irvine site, but cannot get the included
> example to execute.
> It looks like I need some path or a .dll is missing.
> Here is the diagnostic for the starting few lines
>
> import pyopencl as cl
> import numpy
> import numpy.linalg as la
>
> resulting in the following error report.
>
> Traceback (most recent call last):
> File "C:\Python26\OpenCl_ex.py", line 1, in <module>
> import pyopencl as cl
> File "C:\Python26\lib\site-packages\pyopencl\__init__.py", line 4, in
> <module>
> import pyopencl._cl as _cl
> ImportError: DLL load failed: The specified module could not be found.
Check the directory C:\Python26\lib\site-packages\pyopencl\ and look for
a file named _cl.pyd or _cl.dll. Then use the tool 'Dependency Walker'
to check whether it can find all the DLLs it needs.
HTH,
Andreas