pycuda 2011.2.2, ubuntu 11.10, gcc 4.4. python 2.7. I am tired because every
next step in it bring some error, and I am trying it for 2-3 days.
i am getting this error on running "..pycuda-2011.2.2$ make -j 4"
/usr/bin/ld: cannot find -lboost_python-mt
/usr/bin/ld: cannot find -lcuda
/usr/bin/ld: skipping incompatible /usr/local/cuda/lib/libcurand.so when
searching for -lcurand
I tried this but its not helping:
$ sudo ln -s /usr/lib/libboost_python-mt-py26 /usr/lib/libboost_python-mt
Additional information:
I don't see ld directory in /usr/bin. This file "libboost_python-mt-py26" is
present in usr/lib. Also, I installed pycuda even after the error above,
then I got error in ">>> import pycuda.driver as cuda" :
from pycuda._driver import *
ImportError: No module named _driver
Can we fix it without solving the first error problem ?
Any help is appreciated, thanks
--
View this message in context: http://pycuda.2962900.n2.nabble.com/installing-pycuda-2011-2-2-error-cannot…
Sent from the PyCuda mailing list archive at Nabble.com.
Nope, nothing has changed since then. If ctypes works for you, that's
great. However, I don't think using codepy is that difficult - if you
look at the example I posted, you'll see it's not that much work to
compile things with nvcc using codepy.
http://wiki.tiker.net/PyCuda/Examples/ThrustInterop
Perhaps the Boost dependence scares people off - but if you're using
PyCUDA, you are already using Boost. ;)
It might be useful to include more of Thrust as a precompiled binary,
but in general it's not possible to compile all of Thrust, since it's
a template library. Even when you restrict yourself to a basic set of
fundamental types, if you allow tuples, the combinatorics are
prohibitive.
- bryan
On Thu, May 24, 2012 at 3:46 AM, Igor <rychphd(a)gmail.com> wrote:
> Hi Andreas, (Hi Bryan),
>
> Last December I was asking you about CodePy. See how far I went with
> it with your help: http://dev.math.canterbury.ac.nz/home/pub/17/
>
> Note, there is no CUDA or thrust code in the CodePY example. There
> seemed to be no easy way to do it. I'll paste some excerpts from our
> emails from Dec 16,17,
>
> I: "My next question, suppose MODULE_CODE contains some thrust code and
> would have to be compiled through nvcc (and g++). Simply using
> nvcc_toolchain,
>
> nvcc_toolchain = codepy.toolchain.guess_nvcc_toolchain()
> cmod = extension_from_string(nvcc_toolchain, "module", MODULE_CODE)
>
> Didn't work of course. Do you have a similar function that takes a
> STRING, both host_toolchain, and nvcc_toolchain, and compiles it? If
> not, what is the right way?"
>
> B: "NVCC can't parse Boost well, so I have to segregate the host code
> which binds it to Python from the CUDA code compiled by NVCC.
> The way I do this is to create a codepy.bpl.BoostPythonModule which
> has the host entry point (and will be compiled by g++). Then I create
> a codepy.cuda.CudaModule which references the BoostPythonModule
> (making this link explicit lets codepy compile them together into a
> single binary). Then I call compile on the CudaModule, which should
> to the right thing. You can see code that does this here:
> http://code.google.com/r/bryancatanzaro-copperhead/source/browse/copperhead…"
>
> A: "I'd just like to add that I recently split out the code generation bits
> of codepy and called them cgen.
>
> http://pypi.python.org/pypi/cgen
> https://github.com/inducer/cgen
> https://github.com/inducer/codepy
>
> (but compatibility wrappers that wrap cgen into codepy will stay in
> place for a while)"
>
> Has something changed since then?
>
> ctypes works fine and it has the advantage of not having to use boost.
> It's just an unaltered C++/CUDA/thrust code. Invoking systems' nvcc
> was as easy as gcc. As for the caching, I check the hash of the source
> string: if it has changed, I build and load a new (small!) .so module
> with the hash value attached to the name. The pointers into the old
> .so get garbage collected and unloaded; if they are stored in a tmp
> folder -- the .so files get deleted eventually.
>
> I remember you preferred Boost::Python to ctypes in general for its
> better performance; but if we make calls to ctypes library rarely,
> small additional overheads, if there were some, aren't important.
>
> A better programme would be to port all the algorithms and interfaces
> of Thrust to PyCUDA. The only reason I need thrust for example, is
> that it can find me the extremum element's _location_ which I still
> don't know how to do in PyCUDA.
>
> Cheers,
> Igor
>
> On Thu, May 24, 2012 at 11:58 AM, Andreas Kloeckner
> <lists(a)informa.tiker.net> wrote:
>> Hi Igor,
>>
>> On Thu, 24 May 2012 10:51:55 +1200, Igor <rychphd(a)gmail.com> wrote:
>>> Andreas, thanks, but it currently implies Linux, I'll see if I can
>>> make it work on Windows. Or maybe I'll submit and someone will try it
>>> on Windows. I just need to extract it from Sage into a plain Python
>>> script. Give me a couple of days.
>>> http://dev.math.canterbury.ac.nz/home/pub/14/
>>> http://dev.math.canterbury.ac.nz/home/pub/19/
>>
>> I would actually suggest you use the codepy machinery to let nvcc do the
>> compilation--this has the advantage that a) there is code out there that
>> makes this work on Windows (Bryan?) and b) you get compiler caching for
>> free.
>>
>> All you'd need to do is build an analog of extension_from_string, say
>> ctypes_dll_from_string. Just imitate this code here, where
>> compile_from_string does all the hard work:
>>
>> https://github.com/inducer/codepy/blob/master/codepy/jit.py#L146
>>
>> In any case, even if you can make something that's Linux-only, it would
>> likely help a big bunch of people. Windows support can always be added
>> later.
>>
>> Andreas
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA(a)tiker.net
> http://lists.tiker.net/listinfo/pycuda
Hi Igor,
On Thu, 24 May 2012 10:51:55 +1200, Igor <rychphd(a)gmail.com> wrote:
> Andreas, thanks, but it currently implies Linux, I'll see if I can
> make it work on Windows. Or maybe I'll submit and someone will try it
> on Windows. I just need to extract it from Sage into a plain Python
> script. Give me a couple of days.
> http://dev.math.canterbury.ac.nz/home/pub/14/
> http://dev.math.canterbury.ac.nz/home/pub/19/
I would actually suggest you use the codepy machinery to let nvcc do the
compilation--this has the advantage that a) there is code out there that
makes this work on Windows (Bryan?) and b) you get compiler caching for
free.
All you'd need to do is build an analog of extension_from_string, say
ctypes_dll_from_string. Just imitate this code here, where
compile_from_string does all the hard work:
https://github.com/inducer/codepy/blob/master/codepy/jit.py#L146
In any case, even if you can make something that's Linux-only, it would
likely help a big bunch of people. Windows support can always be added
later.
Andreas
On Thu, 24 May 2012 09:54:15 +1200, Igor <rychphd(a)gmail.com> wrote:
> From within Python, you can make a string containing your C++ code
> that uses thrust, save it to file, invoke nvcc to build it as a
> library. Then use ctypes to load it and get the handle to the
> function.
>
> The function can accept a device pointer. Then, suppose you've done
> some work using PyCUDA. Get the GPU pointer and pass it to the
> function -- no memory copying occurs here. Call the function letting
> thrust do its stuff inside there and the output can be a device
> pointer again or whatever.
>
> That's the way I combine the two great libraries PyCUDA and thrust.
> Tell me if it sounds suitable and I'll send you an example.
If you wouldn't mind contributing an example for the wiki, I'd be very
happy to add it. :)
Andreas
http://dev.math.canterbury.ac.nz/home/pub/14/http://dev.math.canterbury.ac.nz/home/pub/19/
There are some tricks that took me a while to get right. I've polished
and simplified it since then, but it's buried in some production code,
not easy to extract. See other published worksheets on that server
though.
Igor
On Thu, May 24, 2012 at 10:22 AM, Periwal, Vipul (NIH/NIDDK) [E]
<vipulp(a)niddk.nih.gov> wrote:
> I'd be very interested in any example of what you outlined in your email on the PyCUDA email list.
>
> Thanks,
> Vipul Periwal
>From within Python, you can make a string containing your C++ code
that uses thrust, save it to file, invoke nvcc to build it as a
library. Then use ctypes to load it and get the handle to the
function.
The function can accept a device pointer. Then, suppose you've done
some work using PyCUDA. Get the GPU pointer and pass it to the
function -- no memory copying occurs here. Call the function letting
thrust do its stuff inside there and the output can be a device
pointer again or whatever.
That's the way I combine the two great libraries PyCUDA and thrust.
Tell me if it sounds suitable and I'll send you an example.
Igor
On Wed, May 23, 2012 at 7:58 AM, Apostolis Glenis <apostglen46(a)gmail.com> wrote:
> Just curious :
> What would it take to compile a thrust function with pyCUDA?
>
> Apostolis
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA(a)tiker.net
> http://lists.tiker.net/listinfo/pycuda
>
Right. As you can see from the example I posted, you have to keep a
separation between host compiler code and nvcc code, which you then
link together. There are a few things to keep in mind.
1) NVCC cannot see any boost::python. I'm in the process of filing a
bug against boost::python, which contains some non-standard C++ that
will never be compilable by NVCC. Consequently, you'll need to do all
the manipulation of Python objects (access, construction) in host code
compiled with the host compiler.
2) The host compiler cannot see any GPU code. So all your calls to
Thrust, etc. should be done from the device module. You can include
your own code and link against your own libraries with the appropriate
Codepy calls.
3) As far as templates go, I've used two main strategies. As you
mentioned, one is to write a wrapper which instantiates the template,
and call that wrapper from the host code. The other is to use
explicit template instantiation in the device module, and use an
extern template instantiation in the host module. Both have worked
for me in the past.
- bryan
On Wed, May 23, 2012 at 2:20 AM, Apostolis Glenis <apostglen46(a)gmail.com> wrote:
> Really cool stuff.I guess I can have my thrust code in a different file and
> just compile the file at runtime,correct?
> One more thing is templates.If I have a function that requires template
> argument,i have to write a wrapper function for initialization at runtime?
>
> Thanks again,
>
> Apostolis
>
>
> 2012/5/23 Bryan Catanzaro <bcatanzaro(a)acm.org>
>>
>> Thanks and Done!
>>
>> - bryan
>>
>> On Tue, May 22, 2012 at 4:29 PM, Andreas Kloeckner
>> <lists(a)informa.tiker.net> wrote:
>> > On Tue, 22 May 2012 15:43:12 -0700, Bryan Catanzaro <bcatanzaro(a)acm.org>
>> > wrote:
>> >> Sure, here's an example of how to call thrust::sort on a PyCUDA
>> >> gpuarray.
>> >> https://gist.github.com/2772091
>> >
>> > Cool, like it! I've stolen this and put it here:
>> >
>> > http://wiki.tiker.net/PyCuda/Examples/ThrustInterop
>> >
>> > Bryan, can you please fill in a license there?
>> >
>> > Thanks!
>> > Andreas
>
>
Hi everyone,
I'm working on a reasonably large piece of Python software which uses
PyCUDA for the performance-critical section of the code. I've been
experiencing a memory leak, and while trying to track it down I've
noticed that PyCUDA has a large virtual memory footprint -- somewhere in
the ballpark of 36GB even when no arrays have yet been allocated. Is
this typical for PyCUDA, or is there perhaps something wrong with my
setup?
Thanks,
'
Brendan Wood
On Wed, 23 May 2012 08:55:22 -0400, Thomas Wiecki <Thomas_Wiecki(a)brown.edu> wrote:
> Hi,
>
> I get:
>
> Traceback (most recent call last):
> File "sim_drift_gpu.py", line 4, in <module>
> import pycuda.gpuarray as gpuarray
> File "/usr/local/lib/python2.7/dist-packages/pycuda-2011.2.2-py2.7-linux-i686.egg/pycuda/gpuarray.py",
> line 3, in <module>
> import pycuda.elementwise as elementwise
> File "/usr/local/lib/python2.7/dist-packages/pycuda-2011.2.2-py2.7-linux-i686.egg/pycuda/elementwise.py",
> line 33, in <module>
> from pycuda.tools import context_dependent_memoize
> File "/usr/local/lib/python2.7/dist-packages/pycuda-2011.2.2-py2.7-linux-i686.egg/pycuda/tools.py",
> line 30, in <module>
> import pycuda.driver as cuda
> File "/usr/local/lib/python2.7/dist-packages/pycuda-2011.2.2-py2.7-linux-i686.egg/pycuda/driver.py",
> line 545, in <module>
> _add_functionality()
> File "/usr/local/lib/python2.7/dist-packages/pycuda-2011.2.2-py2.7-linux-i686.egg/pycuda/driver.py",
> line 525, in _add_functionality
> Function._param_set = function_param_set_pre_v4
> NameError: global name 'function_param_set_pre_v4' is not defined
>
> I think there is a typo in line 145 in driver.py when CUDA < 4.0 is used:
> function_param_set -> function_param_set_pre_v4
>
> so that it matches line 524:
> Function._param_set = function_param_set_pre_v4
Fixed, thanks.
Andreas
Hello,
I am running a program which ,generally,runs fine but,
when i am taking the result ( a matrix) sometimes it's ok and sometimes it
has nan values inside.
I can't understand that behaviour.
(i am using linux 64bit)
--
View this message in context: http://pycuda.2962900.n2.nabble.com/different-results-nan-values-tp7556797.…
Sent from the PyCuda mailing list archive at Nabble.com.