OpenOnDemand

From Define Wiki
Revision as of 11:42, 11 June 2019 by Antony (talk | contribs) (tweaks before final testing)
Jump to navigation Jump to search

Setting up Open OnDemand

What is it?

Open OnDemand is a portal to run web applications to help add ease of use functionality to any Compute Cluster.

Dashboard.png

It provides the following apps out of the box which I shall go through the configuration of using my trusty virtual TrinityX cluster on my file server.

  1. Home directory browser with editor
  2. Active Job Viewer
  3. Job creator
  4. Cluster Shell
  5. Remote Visualisation

here are screenshots of each of the working modules:

OOnDemandFilesBrowser.png OOnDemandActiveJobViewer.png OOnDemandJobCreator.png OOnDemandWebTerminal.png OOnDemandRemoteDesktop1.png OOnDemandRemoteDesktop2.png

Install RPMs on Portal node

yum install centos-release-scl
yum install https://yum.osc.edu/ondemand/1.5/ondemand-release-web-1.5-1.el7.noarch.rpm
yum install ondemand
systemctl start httpd24-httpd

Configure The Portal Node

Configure Authentication to use LDAP and allow remote access to nodes using reverse proxy

Backup the original blank config and configure basic ldap authentication against the cluster LDAP for now. If you want something more advanced then read the ood docs. Note: fill in the LDAP URL according to the cluster config on your system. You can find this in the sssd.conf file normally along with the search dn which is appended before the ?uid below. In a default TrinityX cluster the url will be <controller hostname>.cluster:636/ou=People,dc=local?uid the .cluster is the internal domain name and cannot be removed as it will fails the ssl hostchecks

Note that the last three lines configure the reverse proxy settings. the most important is the host_regex line. If a host does not match this then no proxy will connect to the node this is to stop your portal server being used to redirect traffic to node you do not control. In TrinityX we use the internal TLD domain suffix .cluster '[\w.-]+\.cluster' limits access to these servers. If you wanted to be more specific then you could use something like node\d+\.cluster or (rvis|interactive)\d+\.cluster to limit access to more specific hosts in the cluster.

cp /etc/ood/config/ood_portal.yml /etc/ood/config/ood_portal.yml.orig
cat >> /etc/ood/config/ood_portal.yml << EOF
auth:
  - 'AuthType Basic'
  - 'AuthName "private"'
  - 'AuthBasicProvider ldap'
  - 'AuthLDAPURL "ldaps://trinityx.cluster:636/ou=People,dc=local?uid"'
  - 'AuthLDAPGroupAttribute memberUid'
  - 'AuthLDAPGroupAttributeIsDN off'
  - 'RequestHeader unset Authorization'
  - 'Require valid-user'
host_regex: '[\w.-]+\.cluster'
node_uri: '/node'
rnode_uri: '/rnode'
EOF

put a comment in /opt/rh/httpd24/root/etc/httpd/conf.modules.d/01-ldap.confand explicitly set LDAPLibraryDebug to 0 (off) as for some stupid reason failing LDAP authentications will NOT result in an error in /var/log/httpd24/error_log. If you see an internal server error after login and nothing in the logs then set LDAPLibraryDebug to 1 and it may point you in the right direction (the misssing .cluster) for me

cat >> /opt/rh/httpd24/root/etc/httpd/conf.modules.d/01-ldap.conf << EOF
# change the following LDAPLibraryDebug line to 1 if you get 500 (internal server) errors after login
LDAPLibraryDebug 0
EOF

Now we need to run the portal config generator and restart the service

/opt/ood/ood-portal-generator/sbin/update_ood_portal
systemctl try-restart httpd24-httpd.service httpd24-htcacheclean.service

If you access the website now it should you prompt you for a login and you should be able to login as a normal user with your cluster credentials and then display the most basic portal page, it will look bare compared to the above screenshots.

Configure Slurm and Remote Desktop sessions

Configure Slurm and remote access plugins

Lets fix that now and tell OOD how to use Slurm and create remote Visualisation settings I will cover VirtualGL for 3d accelerated using cluster GPU nodes another time. If you can install nvidia drivers that *REALLY WORK* then VGL is easy.

In the following the login host should be set to a LOGIN node. I am using the controller node here as I have no login node. I have added my user to the admins group in LDAP to allow me to login to the controller node. You could also disable the filter restricting non-root logins to the controller node if you wish. If you do not then you will get a rather unhelpful Authentication failed message when you use the cluster shell functions.

Note that the following uses shared apps and modules to make virtualgl and websockify work so that you do not need to add them to the nodes and re-provision them.

See the /etc/ood/config/clusters.d/my_cluster.yml file? you can call this anything you like like the cluster name and it should correspond to the title: "My Cluster" section. You can have multiple clusters. While I am mentioning this see the cluster: "cluster" in the job section? This is the slurm cluster name in case this have been customised from the default.

mkdir -p /etc/ood/config/clusters.d/
echo > /etc/ood/config/clusters.d/my_cluster.yml << EOF
---
v2:
  metadata:
    title: "My Cluster"
  login:
    host: "trinityx.cluster"
  job:
    adapter: "slurm"
    cluster: "cluster"
    bin: "/usr/sbin/"
    conf: "/etc/slurm/slurm.conf"
    bin_overrides:
      sbatch: "/usr/bin/sbatch"
      squeue: "/usr/bin/squeue"
      scontrol: "/usr/bin/scontrol"
      scancel: "/usr/bin/scancel"
  batch_connect:
    basic:
      script_wrapper: |
        module purge
        %s
    vnc:
      script_wrapper: |
        module purge
        module add turbovnc websockify
        export WEBSOCKIFY_CMD="websockify"
        %s
EOF

Configure remote Desktop form

All the services are now configured ready for the bc_desktop module note we have NOT installed TurboVNC or websockify yet. We will get there later. Lets sort out the bc_desktop form first. There are lots of config options here to allow you to hide or set sensible defaults. Basically if you set a value in line in the yaml e.g. desktop: mate the option will be hardcoded and NOT show up to the user. If you set it using the value: option like this:

  desktop:
    value: "xfce"

then it will be set as a default and the user can modify it. You can set things to null to hide them if they are not needed. you can use the label override to change the label on the form and you may add markdown formatted help descriptions to fields with the help: override.

Do NOT bother configuring the bc_vnc_resolution. The default is to resize the remote desktop based on the size of the window at the client end dynamically.

if you must try the MATE desktop YMMV but when I used it I had problems with dconf being unable to write into /var/run/$UID/ as it did not exist for the user. I could have created a slurm prolog to do this as root but:

  1. I prefer XFCE which doesn't need this.
  2. XFCE takes up less disk space on my tiny VM config.
  3. I am lazy and once something works and know I can fix the the other way I will wait for someone to really need it before I do
mkdir /etc/ood/config/apps/bc_desktop -p
cat > /etc/ood/config/apps/bc_desktop/my_cluster.yml << EOF
---
title: "Remote Desktop"
cluster: my_cluster
attributes:
  bc_account:
    help: "this should be left blank most of the time"
  desktop: "xfce"
  bc_vnc_idle:
    value: 180
    label: "Idle timeout"
    help: This is the time you have to connect to a session before it is automatically terminated
  node_type: null
form:
  - bc_vnc_idle
  - desktop
  - bc_account
  - bc_num_hours
  - bc_num_slots
  - node_type
  - bc_queue
  - bc_vnc_resolution
  - bc_email_on_started
EOF

Now we reconfigure the portal and restart the webserver to make our changes take effect

/opt/ood/ood-portal-generator/sbin/update_ood_portal
systemctl try-restart httpd24-httpd.service

Configure the compute node image

luna chroot compute
yum groupinstall xfce
yum install numpy
exit
luna osimage pack compute

Configure Modules for TurboVNC and websockify

TurboVNC

I cheat and install TurboVNC from the RPM downloaded from the TuboVNC website locally and then copy the /opt/TurboVNC directory into the shared apps folder and then use the following module for the computes

yum install turbovnc-2.2.2.x86_64.rpm
cp /opt/TurboVNC /trinity/shared/apps/ -r


cat /trinity/shared/modules/tr17.10/x86_64/compiler/turbovnc/2.2.2
#%Module
#
# @name:    TurboVNC
# @version:  2.2.2
# @packaging: BIOS IT
#

# Customize the output of `module help` command
# ---------------------------------------------
proc ModulesHelp { } {
   puts stderr "\tAdds $name to your environment variables"
   puts stderr "\t\t\$PATH, \$MANPATH"
}

# Customize the output of `module whatis` command
# -----------------------------------------------
module-whatis   "loads the [module-info name] environment"

# Define internal modulefile variables (Tcl script use only)
# ----------------------------------------------------------
set   name      TurboVNC
set   version   2.2.2
set   prefix    /trinity/shared/apps/$name/$version

# Check if the path exists before modifying environment
# -----------------------------------------------------
if {![file exists $prefix]} {
   puts stderr "\t[module-info name] Load Error: $prefix does not exist"
   break
   exit 1
}

# Update common variables in the environment
# ------------------------------------------
prepend-path   PATH              $prefix/bin

prepend-path   MANPATH           $prefix/man

setenv         TURBONVNC_DIR     $prefix

Websockify

I install this on the controller and set PYTHONPATH and use the --home= option so that it will end up in the shared apps folder The setup.py will try to install numpy on the node and in Centos 7 this breaks, pre-empt by installing from rpm FIRST

yum install numpy
git clone https://github.com/novnc/websockify
cd websockify
mkdir -p /trinity/shared/apps/websockify/0.8.0
PYTHONPATH=/trinity/shared/apps/websockify/0.8.0/lib/python python ./setup.py install --home=/trinity/shared/apps/websockify/0.8.0

 cat /trinity/shared/modules/tr17.10/x86_64/libraries/websockify/0.8.0
#%Module
#
# @name:    websockify
# @version:  0.8.0
# @packaging: BIOS-IT
#

# Customize the output of `module help` command
# ---------------------------------------------
proc ModulesHelp { } {
   puts stderr "\tAdds websockify to your environment variables"
   puts stderr "\t\t\$PATH, \$MANPATH"
}

# Customize the output of `module whatis` command
# -----------------------------------------------
module-whatis   "loads the [module-info name] environment"

# Define internal modulefile variables (Tcl script use only)
# ----------------------------------------------------------
set   name      websockify
set   version   0.8.0
set   prefix    /trinity/shared/apps/$name/$version/

# Check if the path exists before modifying environment
# -----------------------------------------------------
if {![file exists $prefix]} {
   puts stderr "\t[module-info name] Load Error: $prefix does not exist"
   break
   exit 1
}

# Update common variables in the environment
# ------------------------------------------
prepend-path   PATH              $prefix/bin

prepend-path   LD_LIBRARY_PATH   $prefix/lib
prepend-path   LIBRARY_PATH      $prefix/lib
prepend-path   PYTHONPATH        $prefix/lib/python

prepend-path   MANPATH           $prefix/share/man

setenv         WEBSOCKIFY_DIR         $prefix