Basel Ali <basel_ebeed1(a)yahoo.com> writes:
> Thank you so much for the help. I really tried that remote debugging even before sending the email but the problem is that I can't get it to work( I don't know which line should be written on which computer)
> I attached an image of the process of communicating with the cluster
> I write those 2 lines in the code that is being run on/sent to the compute nodes
> from pudb.remote import set_trace
> set_trace(term_size=(80, 24), host="the ip address of the login node", port=9999)
> And write this line on the login node
> telnet localhost 9999
> But it fails. Am I right? I am not knowledgeable about networking so it's a bit hard to me.
Please keep the mailing list cc'd for archival, and stop spamming folks
As for your question, try this:
set_trace(term_size=(80, 24), port=9999)
And write this line on the login node
telnet compute-node 9999
thank you so much for that amazing debugger and for all the support. I very new to python debugging and pudb and my problem is that pudb does not work normally on the HPC. I use SLURM for job scheduling on the cluster.
when I request an interactive job on the compute nodes I run
salloc --partition=cpu --nodes=1 --time=01:00:00
srun python aa.py
in the aa.py file, I start by from pudb import set_trace; set_trace().
the debugger shows like Image(1)
It does not move when I hit "n", even If i hit "n" multiple times. it only moves when I hit "enter" --> image(33)
I also can't type in the command line by hitting "CTRL+X" --> image(44)
Why is that happening? and how can i fix it?
PS: the debugger works fine on the login node(not the compute nodes)