lunes, 8 de septiembre de 2008

name resolution issue Solved

host -v `cat /var/spool/torque/defaul_server`
This is a diagnostic tool for the MOM daemon
momctl -h cl1n001 -d 3 [from the server]
The problem was that the file:
/var/spool/torque/server_name was defining the wrong server. It was automatically set to the external server name [158.42.92.158] "quartz", but the right configuration should point to the internal server name, which is:
"admin-net" [10.0.10.1]

Problem solved....

Useful links:
http://www.clusterresources.com/torquedocs21/10.1troubleshooting.shtml#15034
http://www.clusterresources.com/torquedocs21/1.2basicconfig.shtml




viernes, 5 de septiembre de 2008

name resolution issue with TORQUE

name resolution issue with TORQUE

Check that the communications are OK


h
ost -v `cat /var/spool/torque/defaul_server`
ssh -x c0-0 ping `cat /var/spool/torque/server_name`
ssh -x c0-0 host -v `cat /var/spool/torque/server_name`

jueves, 4 de septiembre de 2008

Installing torque

Torque installation for pure 64 bits machine
Brute force approach to get rid of PBSPro
Not very subtle but it seems to work
$]rm -rf /usr/pbs
$]rm -rf /var/spool/PBS
Or ... to uninstall old torques
$]cd /opt/torque.x.x.x/
$]make uninstall
## to uninstall previous versions
Configuring torque installation for pure 64 bits machine
$]tar -xzvf torque.tar.gz
$]cd torque.x.x.x
$]./configure CC=/opt/intel/cce/10.1.015/bin/icc F77=ifort
$]make
$]make install
$]./torque.setup USER [root/user]
$]
In order to create self extracting packages to install on the compute nodes do the following:
$]make packages
Finally distribute and install on the nodes
$]