Cloud infrastructure in a nutshell

  1. 1Purpose of this guide 

In this guide you can find a complete walk-through about how to install and configure a cloud manager and run distributed jobs through a simple queue manager.

The main components are built around :

  1. 2Infrastructure 

All the next step are presented for the following kind of infrastructures:

 
  1. 3Create the first virtual machine for OpenNebula manager (nebula-manager-groupxx) 

As first step we start the installation of OpenNebula software on a persistent virtual machine created on jnws024 hypervisor.

    1. 3.1Start virtual-machine-manager on your virtual Desktop 

Connection to your virtual desktop at url http://YOURHOST.ihep.ac.cn:8080/vnc.html?host=YOURHOST.ihep.ac.cn&port=8080

Open a Terminal (Application>>System Tools>>Terminal) and install Virtual Machine Manager with “yum install virt-manager”. Accept the installation of some package and the instruction from Virtual Machine Manager. At the end of installation restart libvirtd daemon

 

# service libvirtd restart

 

After this operation, you can start virt-manager :

 

LANG=C virt-manager

 

 

Create a new connection (File >> Add connection …) to jnws024 as user “root” (password ihep;test), and use this connection for next steps.

 
    1. 3.2Create a Virtual Machine 

Click on the appropriate icon to create a new VM, give it a name according to your group number : nebula-manager-groupYOUR_GROUP_NUMBER, and choose for a Network Install

 

Enter the URL: http://202.122.33.67/yum/scientific/6.5/x86_64/os/

 

Set the size of RAM to 2048MB and 1 CPU

 

Select a new disk image of 20GB entirely allocated

 

Verify all the parameter are correct, expecially the bridged networking ….

 

Click on finish and switch to console tab.

 
    1. 3.3Start the installation of operating system 

Choose English language and us keyboard type and then choose for manual configuration (use arrow keys and spacebar):

 

input network parameter of your group :

 

Move on the next screen and confirm

 

Select Yes, discard any data

 

Give it a name: nebula-manager-groupYOUR_GROUP_NUMBER.ihep.ac.cn (remember the FQDN !!)

 

Set root password : ihep;test

 

Continue by pressing Next

 

Confirm the disk layout and press Next

 

Confirm you want to Write changes to disk …

 

Basic Server layout is enough for us

 

wait some minutes and when the installation is finished click on Reboot ..

 

Congratulation your operating system is ready to start ... after reboot try to access your new Nebula Manager virtual machine via ssh from your laptop (via Putty,MobaXterm,...):

ssh -l root 192.168.64.XXX

 

Configure NTP server, delete all the server line into /etc/ntp.conf and add this one :

server ntp1.ihep.ac.cn

 

and adjust the date:

# service ntpd stop

# ntpdate ntp1.ihep.ac.cn

# service ntpd start

 

Disable selinux immediately:

 

# echo 0 > /selinux/enforce

 

and make persistent this modification opening the file /etc/selinux/config and modifying the line SELINUX=xxxxx into:

 

SELINUX=disabled

 

    1. 3.4Configure a global proxy to gain Internet access 

 

Add these three lines at the end of file /root/.bashrc in your nebula manager :

 

export http_proxy=202.122.33.53:3128

export ftp_proxy=202.122.33.53:3128

export https_proxy=202.122.33.53:3128

 

And test immediately .. if the curl command output is like this it works

 

# source .bashrc

# curl https://wtfismyip.com/text

202.122.33.53

 

    1. 3.5Start the installation of OpenNebula on the nebula manager node 

Detailed info can be found at http://docs.opennebula.org/4.8/design_and_installation/building_your_cloud/ignc.html

Add OpenNebula repository in file /etc/yum.repos.d/opennebula.repo:

[opennebula]

name=opennebula

baseurl=http://downloads.opennebula.org/repo/4.8/CentOS/6/x86_64

enabled=1

gpgcheck=0

 

Add epel repository and accept to import all GPG keys :

# yum install epel-release

 

and install the software confirming epel GPG key :

# yum install opennebula-server opennebula-sunstone opennebula-ruby

 

Complete the installation with ruby gems selecting 0 :

# /usr/share/one/install_gems

Select your distribution or press enter to continue without

installing dependencies.

 

0. CentOS/RedHat

1. Ubuntu/Debian

2. SUSE

       

0

...

...

...

Building native extensions.  This could take a while...

Successfully installed curb-0.8.8

Successfully installed builder-3.2.2

Successfully installed trollop-2.1.2

Successfully installed polyglot-0.3.5

Successfully installed treetop-1.6.2

Successfully installed parse-cron-0.1.4

Successfully installed multi_json-1.11.1

Successfully installed jmespath-1.0.2

Successfully installed aws-sdk-core-2.1.1

Successfully installed aws-sdk-resources-2.1.1

Successfully installed aws-sdk-2.1.1

Building native extensions.  This could take a while...

Successfully installed ox-2.2.0

16 gems installed

 

 

Start OpenNebula and verify the installation:

 

# su - oneadmin

$ one start

$ onevm list

    ID USER     GROUP    NAME            STAT UCPU    UMEM HOST             TIME

 

Annotate the oneadmin password (it will be useful later …):

 

$ cd .one

$ ll

total 4

-rw-r--r--. 1 oneadmin oneadmin 42 Jun 23 20:40 one_auth

$ cat one_auth

oneadmin:4689b636b7bde5565a0d95e865bc4609

 

  1. 4Configure your host  to be an hypervisor 

First of all we need to realize what virtual network infrastructure wants. A simple solution could be using a virtual bridge to connect VMs to physical network

 

Connect to your hypervisor via virtual desktop  http://YOURHOST.ihep.ac.cn:8080/vnc.html?host=YOURHOST.ihep.ac.cn&port=8080 (or via ssh client from your laptop)

Disable NetworManager permanently :

# service NetworkManager stop

# chkconfig NetworkManager off

 

Launch virt-manager :

 

# LANG=C virt-manager

 

 

And connect to localhost (add new connection if necessary) !! not jnws024 !!

 

Configure a virtual bridge (bridged) on this host using the virtual manager (Edit >> Connection details >> Network interfaces)

 

 

Add a network interface (+) and chose Bridged :

 

 

click on Forward and complete the form as follow

 

Click Configure near IP settings and add network parameter of your hypervisor

 

Before click on Finish ask for a check by the teacher ...

Complete the operation by pressing Finish.

Verify that bridge is correctly configured via brctl command and setting AGEING parameter to 0 (if you miss this point you can experience random problems in the future, remember to give this command in case of reboot of hypervisor):

 

# brctl setageing bridged 0

 

    1. 4.1Create a Virtual Machine via virt-manager into your hypervisor 

Follow the step done for the installation of the nebula-manager: create a Virtual Machine, use network installation, (pay attention on step 3.2 that now we want to use QCOW2 disk image format and change jnws024 to localhost !!!) give it the name templatevm, and install a base system.

To specify a qcow2 image you need to click on 'Select managed or other existing storage', go into default under Storage Pools and then click New Volume :

 

 

Click on Browse.. and New volume to create a qcow2 image, fill the form as follow, click Finish and Choose Volume

 

Now you can go on in the process of installation as usual.

 

 

After the reboot connect to this new virtual machine:

 

ssh -l root <IP_NEW_VM>

 

Configure proxy as done before and install the openebula required packages for the contextualization (http://dev.opennebula.org/projects/opennebula/files )

 

# wget http://dev.opennebula.org/attachments/download/804/one-context_4.8.0.rpm

# yum localinstall one-context_4.8.0.rpm

 

Now we can shutdown this virtual machine and use it to create the “golden image”, which we can use to instantiate new virtual machines in the future. This process can be performed after some other steps ..

 

# init 0

 

    1. 4.2Configure your hypervisor (jnws0XX) to became a suitable host for nebula 

More info can be found at http://docs.opennebula.org/4.8/design_and_installation/building_your_cloud/ignc.html#step-5-node-installation

Configure proxy server if needed and install some specific software from OpenNebula repositories. First step add a new repository into file /etc/yum.repos.d/opennebula.repo

[opennebula]

name=opennebula

baseurl=http://downloads.opennebula.org/repo/4.8/CentOS/6/x86_64

enabled=1

gpgcheck=0

 

# yum install opennebula-node-kvm

 

Restart libvirtd ignore errors on stopping:

 

# service libvirtd restart

 

Add nebula manager IP address to file /etc/hosts:

 

# echo 192.168.64.XX nebula-manager-groupxx >> /etc/hosts

 

and verify with ping nebule-manager-groupxx command

    1. 4.3Configure SSH passwordless access between nebula-manager-groupXX and your hypervisor 

Login as “root” at nebula-manager-groupXX and create a user rsa keys pair for oneadmin (confirm defaults):

 

# su – oneadmin

$ ssh-keygen

 

Add some ssh client options in /var/lib/one/.ssh/config file:

 

ConnectTimeout 5

Host *

        StrictHostKeyChecking no

 

And set the right permission :

$ chmod 700 /var/lib/one/.ssh/config

 

Disable http_proxy for user oneadmin and chose an editor by adding these line to .bashrc file

unset http_proxy

unset ftp_proxy

unset https_proxy

EDITOR=nano

export EDITOR

 

Self trust oneadmin public key :

$ cat /var/lib/one/.ssh/id_rsa.pub >> /var/lib/one/.ssh/authorized_keys

 

Create a package of ssh settings and move to your hypervisor:

 

$ tar -Pcvf /tmp/ssh_oneadmin.tar /var/lib/one/.ssh

/var/lib/one/.ssh/

/var/lib/one/.ssh/id_rsa.pub

/var/lib/one/.ssh/id_rsa

/var/lib/one/.ssh/authorized_keys

/var/lib/one/.ssh/known_hosts

/var/lib/one/.ssh/config

/var/lib/one/.ssh/id_dsa

/var/lib/one/.ssh/id_dsa.pub

 

$ scp /tmp/ssh_oneadmin.tar root@jnws0XX:/tmp/

$ ssh root@jnws0XX tar -Pxvf /tmp/ssh_oneadmin.tar

 

Test the passwordless connection:

$ ssh jnws0XX  uname –a

Linux jnws0XX 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 04:15:08 CET 2013 x86_64 x86_64 x86_64 GNU/Linux

 

    1. 4.4Mount Datastore on nebula-manager-groupxx and hypervisor 

On both nebula-manager-groupxx and your hypervisor jnws0XX mount the NFS datastore in the same position:

# echo jnws024:/Datastore/groupxx /var/lib/one/datastores nfs mountvers=3,defaults,_netdev 0 0 >> /etc/fstab

# mkdir -p /var/lib/one/datastores

# mount -a

# cd /var/lib/one/datastores

# mkdir -p 0

# mkdir -p 1

# chown oneadmin. .

# chown oneadmin. 0 1

 

  1. 5Configure Datastore on nebula-manager-groupxx and complete the cloud manager setup 

Edit default datastore (ID=1) with right parameters  as user “oneadmin”:

$ onedatastore update 1

BASE_PATH="/var/lib/one//datastores/"

CLONE_TARGET="SYSTEM"

DISK_TYPE="FILE"

DS_MAD="fs"

LN_TARGET="NONE"

TM_MAD="qcow2"

TYPE="IMAGE_DS"

 

Add your hypervisor jnws0XX under the control of nebula-manager-groupxx

 

$ onehost create jnws0XX -i kvm -v kvm -n dummy

ID: 0

$ onehost list

  ID NAME            CLUSTER   RVM      ALLOCATED_CPU      ALLOCATED_MEM STAT  

   0 jnws0XX        -           0                  -                  - init

 

$ onehost list

  ID NAME            CLUSTER   RVM      ALLOCATED_CPU      ALLOCATED_MEM STAT  

   0 jnws0XX        -           0      0 / 1600 (0%)    0K / 11.7G (0%) on    

 

Configure a virtual network (http://docs.opennebula.org/4.8/user/virtual_resource_management/vgg.html), create a file named vnet.txt with these info, pay attention to IP information and verify with yours parameters :

NAME        = "Private Network"

DESCRIPTION = "A private network for VM inter-communication"

 

BRIDGE = "bridged"

 

# Context attributes

NETWORK_ADDRESS = "10.10.0.0"

NETWORK_MASK    = "255.255.255.0"

DNS             = "XXX.XXX.XXX.XXX"

GATEWAY         = "10.10.0.1"

 

#Address Ranges, only these addresses will be assigned to the VMs

AR=[

    TYPE = "IP4",

    IP   = "10.10.0.xx",

    SIZE = "15"

]

 

 

 

Use this file to define a virtual network for your cloud ...

$ onevnet create vnet.txt

ID: 0

$ onevnet list

  ID USER            GROUP        NAME                CLUSTER    BRIDGE   LEASES

   0 oneadmin        oneadmin     Private Network     -          bridged       0

 

Add your first VM image (created some time ago with virt-manager …)

(http://docs.opennebula.org/4.8/user/virtual_resource_management/img_guide.html)

Copy the disk image from standard path for libvirtd on your hypervisor to the nebula-manager-groupxx

$ scp root@jnws0XX:/var/lib/libvirt/images/templateVM.img /tmp/

 

Import VM disk as VM image file. Create a file named slc65_img.one like this :

 

$ cat > slc65_img.one

NAME          = "SLC65"

PATH          = /tmp/templateVM.img

TYPE          = OS

DESCRIPTION   = "SLC65 base installation."

 

And use it to define an image of operating system

$ oneimage create slc65_img.one --datastore default

 

Wait for copy to end :

 

$ dstat

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--

usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw

  2   1  95   1   0   0| 104k  208k|   0     0 |   0     0 | 208   841

  1  18   0  76   0   5|   0    48k|  45M   74M|   0     0 |5541  6016

  4  37   0  50   0   9|2112k  152k| 102M   13M|   0     0 |6803  9070

  3  32   0  53   0  11|   0     0 |  82M   51M|   0     0 |6446  8045

...

 

Create your first template (http://docs.opennebula.org/4.8/user/virtual_resource_management/vm_guide.html). Create a file named vm.txt with these information, check with your parameters :

 

NAME   = vm<YOUR GROUPNUMBER>

MEMORY = 1024

CPU    = 1

 

DISK = [ IMAGE_ID=7, DRIVER="qcow2", BUS="virtio", DEV_PREFIX="vd", TARGET="vda" ]

 

NIC = [ NETWORK = "Private Network", NETWORK_UNAME="oneadmin" ]

 

CONTEXT=[ HOSTNAME="vm<YOUR GROUPNUMBER>-$VMID.ihep.ac.cn",FILES="/var/lib/one/init.sh", NETWORK=YES ]

 

GRAPHICS = [

  TYPE    = "vnc",

  LISTEN  = "0.0.0.0"]

 

Use this file to define your first template :

$ onetemplate create vm.txt

 

And now let's try to instantiate your first VM in cloud ...

 

$ onetemplate instantiate 0

$ onevm list

    ID USER     GROUP    NAME            STAT UCPU    UMEM HOST             TIME

     0 oneadmin oneadmin test-vm-0       runn    0      0K jnws0XX     0d 00h00

    1. 5.1Enable the web interface - Sunstone 

In order to enable the Sunstone web interface, install additional software on nebula-manager-groupxx (more info on http://docs.opennebula.org/4.8/administration/sunstone_gui/sunstone.html )

 

# /usr/share/one/install_gems sunstone

# yum install novnc

 

Edit  /etc/one/sunstone-server.conf and modify the host line as follows:

:host: 0.0.0.0

 

Disable firewall:

 

# iptables -F

# chkconfig iptables off

 

Start Sunstone as user oneadmin :

 

$ sunstone-server start

 

Connect to Sunstone web interface from your laptop using the oneadmin password previously saved:

http://<IP ADDRESS OF YOUR NEBULA MANAGER>:9869/

login : oneadmin

password : 4689b636b7bde5565a0d95e865bc4609

 

  1. 6Contextualization 

The Contextualization step can be done via one external script executed on the VM during the boot procedure (more information at http://docs.opennebula.org/4.8/user/virtual_machine_setup/cong.html). In this script software installation and custom configuration can be performed, and many other tasks can be automated without creating a new image and/or a new template. In this way, your changes to VM's configuration can be applied much faster.

Create a custom init.sh script as user oneadmin at nebula-manager-groupxx, to install CVMFS at first boot (check CVMFS_HTTP_PROXY !!!):

#!/bin/bash

 

setup_cvmfs(){

    # fuse      

    yum -y install fuse

    yum clean all

    # cvmfs repo

    cd /etc/yum.repos.d/

    wget http://cvmrepo.web.cern.ch/cvmrepo/yum/cernvm.repo

    # repo key

    cd /etc/pki/rpm-gpg/

    wget http://cvmrepo.web.cern.ch/cvmrepo/yum/RPM-GPG-KEY-CernVM

    # install cvmfs stuff

    yum -y install cvmfs cvmfs-init-scripts cvmfs-auto-setup cvmfs-keys

    # configure cvmfs

    touch /etc/cvmfs/default.local

    cat > /etc/cvmfs/default.local <<EOF

    mkdir /var/cvmfs-cache

CVMFS_REPOSITORIES=boss.cern.ch

CVMFS_HTTP_PROXY="http://10.10.0.99:8080"

# CVMFS_CACHE_DIR=/scratch/cvmfs/boss

CVMFS_CACHE_DIR=/var/cvmfs-cache

CVMFS_CACHE_BASE=/var/cvmfs-cache

# CVMFS_QUOTA_LIMIT=30720

CVMFS_QUOTA_LIMIT=5700

EOF

 

    touch /etc/cvmfs/config.d/boss.cern.ch.conf

    chmod ugo+x /etc/cvmfs/config.d/boss.cern.ch.conf  

    mkdir -p /var/cvmfs-cache

    chown cvmfs. /var/cvmfs-cache -R

    cat  > /etc/cvmfs/config.d/boss.cern.ch.conf <<EOF  

#!/bin/sh

repository_start() {

   [ ! -L /opt/boss ] && ln -s /cvmfs/boss.cern.ch /opt/boss

}

repository_stop() {

   [ -L /opt/boss ] && rm -f /opt/boss

}

EOF

    ln -s /cvmfs/boss.cern.ch /opt/boss

 

    cvmfs_config reload

    # restart autofs

    echo "Start cvmfs services"

    service autofs restart

    cvmfs_config probe

 

    # avoid automatic updates of cvmfs

    echo "exclude=cvmfs*" >> /etc/yum.conf

 

}

 

(

export http_proxy=202.122.33.53:3128
export ftp_proxy=202.122.33.53:3128

export https_proxy=202.122.33.53:3128

echo "remove selinux"

echo 0 > /selinux/enforce

echo "*************Setup CVMFS"

date

# cvmfs

setup_cvmfs

 

) 2>&1 | tee -a /var/log/context.log

 

 

Edit your template adding FILES option in the CONTEXT section :

$ onetemplate update 0

...

CONTEXT=[ HOSTNAME="vm<YOUR GROUPNUMBER>-$VMID.ihep.ac.cn",FILES="/var/lib/one/init.sh", NETWORK=YES ]

...

 

Instantiate a new VM and check if CVMFS working. Connect via ssh to the new VM and enter this commands :

# cd /opt/boss

# df -h .

Filesystem      Size  Used Avail Use% Mounted on

cvmfs2          5.6G   46M  5.6G   1% /cvmfs/boss.cern.ch

  1. 7Install torque pbs server on nebula-manager-groupxx 

In order to install pbs server on nebula-manager-groupxx, one needs to install some additional software:

# yum install libxml2-devel openssl-devel gcc gcc-c++ boost-devel

 

Now download source software of torque pbs (http://wpfilebase.s3.amazonaws.com/torque/torque-2.5.13.tar.gz), and proceed with build and install:

# wget http://wpfilebase.s3.amazonaws.com/torque/torque-2.5.13.tar.gz

# tar -zxf torque-2.5.13.tar.gz

# cd torque-2.5.13/

# ./configure

# make

# make install

 

Add a line with IP address of your nebula-manager-groupxx to /etc/hosts:

<IP ADDRESS OF YOUR NEBULA MANAGER> nebula-manager-groupxx.ihep.ac.cn nebula-manager-groupxx

 

Save all configuration parameter in a file /var/spool/torque/server.conf editing lines with server managers and server operators:

delete queue long

 

create queue long

set queue long queue_type = Execution

set queue long resources_max.cput = 72:00:00

set queue long resources_max.walltime = 72:00:00

set queue long Priority = 100

 

# equal to max number of core

# can be modified later : qmgr -c “set queue long max_running = XXX”

set queue long max_running = 6

 

set queue long enabled = True

set queue long started = True

 

# Set server attributes.

#

set server scheduling = True

set server managers = root@nebula-manager-groupxx.ihep.ac.cn

set server operators = root@nebula-manager-groupxx.ihep.ac.cn

set server log_events = 511

set server mail_from = adm

set server query_other_jobs = True

set server scheduler_iteration = 600

set server node_check_rate = 150

set server tcp_timeout = 6

set server node_pack = False

 

Apply the configuration, and start pbs server:

# pbs_server -t create

 

# qterm -t quick

 

# pbs_server

# pbs_sched

# pbs_mom

 

 

# cat /var/spool/torque/server.conf | qmgr

Max open servers: 10239

qmgr obj=long svr=default: Unknown queue

 

Check the server status:

# qmgr -c "list server"

Server nebula-manager-groupxx.ihep.ac.cn

        server_state = Active

        scheduling = True

        total_jobs = 0

        state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0

        acl_hosts = nebula-manager

        managers = root@nebula-manager-groupxx

        operators = root@nebula-manager-groupxx

        log_events = 511

        mail_from = adm

        query_other_jobs = True

        scheduler_iteration = 600

        node_check_rate = 150

        tcp_timeout = 6

        node_pack = False

        pbs_version = 2.5.13

        next_job_number = 0

        net_counter = 2 2 2

 

Check the queue configuration (queue name : long):

# qmgr -c "list queue long"

Queue long

        queue_type = Execution

        Priority = 100

        total_jobs = 0

        state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0

        max_running = 6

        resources_max.cput = 72:00:00

        resources_max.walltime = 72:00:00

        mtime = Mon Jul 13 21:36:56 2015

        enabled = True

        started = True

 

Check the queue status:

# qstat -q

 

server: nebula-manager

 

Queue            Memory CPU Time Walltime Node  Run Que Lm  State

---------------- ------ -------- -------- ----  --- --- --  -----

long               --   72:00:00 72:00:00   --    0   0  6   E R

                                               ----- -----

                                                   0     0

    1. 7.1Modify init.sh to install torque pbs client automatically 

Installation on client require some more work: create some users to run jobs, mount shared /home (exported by storage1 NFS server), and install torque pbs and start client components.

#!/bin/bash

 

 

setup_cvmfs(){

    # fuse      

    yum -y install fuse

    yum clean all

    # cvmfs repo

    cd /etc/yum.repos.d/

    wget http://cvmrepo.web.cern.ch/cvmrepo/yum/cernvm.repo

    # repo key

    cd /etc/pki/rpm-gpg/

    wget http://cvmrepo.web.cern.ch/cvmrepo/yum/RPM-GPG-KEY-CernVM

    # install cvmfs stuff

    yum -y install cvmfs cvmfs-init-scripts cvmfs-auto-setup cvmfs-keys

    # configure cvmfs

    touch /etc/cvmfs/default.local

    cat > /etc/cvmfs/default.local <<EOF

    mkdir /var/cvmfs-cache

CVMFS_REPOSITORIES=boss.cern.ch

CVMFS_HTTP_PROXY="http://10.10.0.XX:8080"

# CVMFS_CACHE_DIR=/scratch/cvmfs/boss

CVMFS_CACHE_DIR=/var/cvmfs-cache

CVMFS_CACHE_BASE=/var/cvmfs-cache

# CVMFS_QUOTA_LIMIT=30720

CVMFS_QUOTA_LIMIT=5700

EOF

 

    touch /etc/cvmfs/config.d/boss.cern.ch.conf

    chmod ugo+x /etc/cvmfs/config.d/boss.cern.ch.conf  

    mkdir -p /var/cvmfs-cache

    chown cvmfs. /var/cvmfs-cache -R

    cat  > /etc/cvmfs/config.d/boss.cern.ch.conf <<EOF

#!/bin/sh

repository_start() {

   [ ! -L /opt/boss ] && ln -s /cvmfs/boss.cern.ch /opt/boss

}

repository_stop() {

   [ -L /opt/boss ] && rm -f /opt/boss

}

EOF

    ln -s /cvmfs/boss.cern.ch /opt/boss

 

    cvmfs_config reload

    # restart autofs

    echo "Start cvmfs services"

    service autofs restart

    cvmfs_config probe

 

    # avoid automatic updates of cvmfs

    echo "exclude=cvmfs*" >> /etc/yum.conf

 

}

 

setup_pbs() {

        # install software

        yum -y install libxml2-devel openssl-devel gcc gcc-c++ boost-devel

        wget http://wpfilebase.s3.amazonaws.com/torque/torque-2.5.13.tar.gz

        tar -zxf torque-2.5.13.tar.gz

        cd torque-2.5.13/

        ./configure

        make

        make install

        # add entry into /etc/hosts ... replace DNS

        echo "<IP ADDRESS OF YOUR NEBULA MANAGER> nebula-manager-groupxx.ihep.ac.cn nebula-manager-groupxx" >> /etc/hosts

        # config client

        cat > /var/spool/torque/mom_priv/config <<EOF

\$pbsserver      nebula-manager-groupxx

\$logevent       255

EOF

        # start pbs_mom

        /usr/local/sbin/pbs_mom       

}

 

function other()

{

    # switch off iptables ;(

    service iptables stop

    chkconfig iptables off

 

    # switch off selinux

    setenforce 0

    # remove autoupdate

    /sbin/service yum-autoupdate stop

    /sbin/chkconfig --del yum-autoupdate

    # setting hostname

   hostname $HOSTNAME

    # add entry to hosts

   echo $ETH0_IP $HOSTNAME >> /etc/hosts

        # make some batch user

        groupadd -g 500 pluto

        useradd -u 500 -g pluto -M pippo

        # mount /home

        echo "jnws024:/SharedHome/groupxx  /home nfs  mountvers=3,defaults,_netdev 0 0 " >> /etc/fstab

        mount -a

}

 

(

export http_proxy=202.122.33.53:3128
export ftp_proxy=202.122.33.53:3128

export https_proxy=202.122.33.53:3128

echo "remove selinux"

echo 0 > /selinux/enforce

echo "*************Setup CVMFS"

date

# disable autoupdate

other

# cvmfs

setup_cvmfs

# pbs

setup_pbs

# print env

env

) 2>&1 | tee -a /var/log/context.log

 

Instantiate a new VM, and verify that boot properly, CVMFS is working and /home is NFS mounted.

Add the new node on server (nebula-manager-groupxx) /etc/hosts file:

 

<IP ADDRESS OF YOUR NEW VM> vm<YOUR GROUPNUMBER>-<NAME OF YOUR NEW VM>

 

Add the node to the pbs server list adding a line to /var/spool/torque/server_priv/nodes

vm<YOUR GROUPNUMBER>-<NAME OF YOUR NEW VM> np=1

 

 

Restart torque pbs server:

 

# killall -9 pbs_server

# killall -9 pbs_sched

# killall -9 pbs_mom

# pbs_server

# pbs_sched

# pbs_mom

 

Test the visibility of the torque pbs server/client on the server side (eventually restart it on the client):

 

# pbsnodes

vmXX

     state = free

     np = 1

     ntype = cluster

     status = rectime=1436821133,varattr=,jobs=,state=free,netload=225458219,gres=,loadave=0.00,ncpus=1,physmem=1020392kb,availmem=2956520kb,totmem=3117536kb,idletime=1160,nusers=0,nsessions=? 0,sessions=? 0,uname=Linux vm26 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 04:15:08 CET 2013 x86_64,opsys=linux

     gpus = 0

 

    1. 7.2Play with torque PBS 

Mount /home on your nebula manager, and create a batch user to run jobs:

# echo "jnws024:/SharedHome/groupXX /home nfs  mountvers=3,defaults,_netdev 0 0 " >> /etc/fstab

# mount -a

# groupadd -g 500 pluto

# useradd -u 500 -g pluto -m pippo

 

Create a RSA key pair for user pippo and trust the user public key and host key:

$ ssh-keygen

$ cd .ssh

$ cat id_rsa.pub > authorized_keys

$ chmod 644 authorized_keys

$ ssh pippo@nebula-manager-groupXX.ihep.ac.cn

The authenticity of host 'nebula-manager-groupXX.ihep.ac.cn (192.168.64.145)' can't be established.

RSA key fingerprint is 4d:44:b6:52:3b:35:c2:68:9f:b1:55:dd:07:de:b1:fa.

Are you sure you want to continue connecting (yes/no)? yes

 

Create a test job using the user pippo. create a file named test.sh like this:

#!/bin/bash

sleep 100s

 

pwd

 

hostname

 

give it the right permission :

$ chmod 755 test.sh

 

 

Send many jobs of the same kind:

$ for var in `seq 1 10` ; do qsub -q long test.sh -j oe; sleep 1; done

12.nebula-manager-groupxx

13.nebula-manager-groupxx

14.nebula-manager-groupxx

15.nebula-manager-groupxx

16.nebula-manager-groupxx

17.nebula-manager-groupxx

18.nebula-manager-groupxx

19.nebula-manager-groupxx

20.nebula-manager-groupxx

21.nebula-manager-groupxx

and watch the queue:

$ qstat -n1

 

nebula-manager:

                                                                                        Req'd    Req'd       Elap

Job ID                        Username    Queue    Jobname          SessID  NDS   TSK   Memory   Time    S   Time

----------------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------

12.nebula-manager-groupxx       pippo       long     test.sh            9201   --     --     --        --  R  00:00:00   vmXX/0

13.nebula-manager-groupxx       pippo       long     test.sh            9206   --     --     --        --  R  00:00:00   vmXX/0

14.nebula-manager-groupxx       pippo       long     test.sh             --    --     --     --        --  Q       --     --

15.nebula-manager-groupxx       pippo       long     test.sh             --    --     --     --        --  Q       --     --

16.nebula-manager-groupxx       pippo       long     test.sh             --    --     --     --        --  Q       --     --

17.nebula-manager-groupxx       pippo       long     test.sh             --    --     --     --        --  Q       --     --

18.nebula-manager-groupxx       pippo       long     test.sh             --    --     --     --        --  Q       --     --

19.nebula-manager-groupxx       pippo       long     test.sh             --    --     --     --        --  Q       --     --

20.nebula-manager-groupxx       pippo       long     test.sh             --    --     --     --        --  Q       --     --

21.nebula-manager-groupxx       pippo       long     test.sh             --    --     --     --        --  Q       --     --

 

Configure BOSS and play more jobs submission ….