Installation Instructions for Ganglia and MDS4 (monitoring)
Contents:
- Overview
- Build Ganglia
- Install Ganglia
- Test Ganglia
- Configure System Service
- Configure Globus for Ganglia
- Configure Globus MDS
- Possible Problems
- Start Globus and Test
1. Overview
A general overview of Ganglia and its combination with Globus can be found at IBM:
Maximize your grid potential, Part 1: Ganglia.
In the following, only the monitoring daemon gmond is used. The Ganglia Meta Daemon gmetad is not considered further here, but is more suited for observing entire cluster complexes. MDS4 uses gmond.
First, we need the current Ganglia sources. The installation archives are available at SourceForge at:
http://ganglia.info/ (ganglia-3.0.7.tar.gz). These instructions have been tested with Ganglia 3.0.7.
2. Build Ganglia
To build the software, start in the Globus directory. As user globus:
- cd /work1/globus/
- tar xvfz /tmp/ganglia-3.0.x.tar.gz
This will unpack the archive into a directory ganglia-3.0.x/. We now change to this directory:
- cd ganglia-3.0.x/
All commands that follow are assumed to be executed from within this directory.
The Globus helper package contains a script to configure Ganglia. Suppose that the package has already been unpacked into a subdirectory globus-helper in the globus user's home directory:
- cp ~/globus-helper/globus-install/ganglia.cfg .
- sh -x ganglia.cfg
Edit the file
gmond/gmond.init
Replace the line
GMOND=/usr/sbin/gmond
by
GMOND=/usr/local/globus/ganglia/sbin/gmond |
Now build Ganglia:
- make
3. Install Ganglia
The installation of Ganglia is done as user root:
- make install
This will install files under /usr/local/globus/ganglia/: libraries under lib/libganglia*, and
include/ganglia.h
bin/ganglia-config
bin/gmetric
bin/gstat
sbin/gmond
should now exist there.
Create a configuration file as user root:
- /usr/local/globus/ganglia/sbin/gmond -t > /etc/gmond.conf
Edit the file /etc/gmond.conf. Fill out the fields "name", "owner", and "latlong".
4. Test Ganglia
It should now be possible to execute gmond.
-
/usr/local/globus/ganglia/sbin/gmond
Then gmond should be listening at port 8649:
- telnet localhost 8649
The XML output may look somewhat messy, but it is easy for machines to read. If you are running gmond
also on local net segments computers already, be prepared to see output
about other machines besides the local machine. This may lead to
problems later on when running MDS4 on the output data, and may require alterations in the gmond configuration. See below.
|
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> <!DOCTYPE GANGLIA_XML [ <!ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)*> <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED> <!ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED> ... ]> <GANGLIA_XML VERSION="3.0.5" SOURCE="gmond"> <CLUSTER NAME="AIP workstation cashmere" LOCALTIME="1193928615" OWNER="AIP" LATLONG="N52.4040 E13.1022" URL="unspecified"> <HOST NAME="cashmere.aip.de" IP="141.33.4.98" REPORTED="1193928599" TN="16" TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1193928579"> <METRIC NAME="disk_total" VAL="339.425" TYPE="double" UNITS="GB" TN="27" TMAX="1200" DMAX="0" SLOPE="both" SOURCE="gmond"/> ... </HOST> </CLUSTER> </GANGLIA_XML> |
5. Configure Ganglia as system service
It is recommended to stop any running gmond (as user root):
- gmond/gmond.init stop
and to install it permanently as a service (MDS4 works best with gmond being installed as a service):
- cp gmond/gmond.init /etc/rc.d/init.d/gmond
- /sbin/chkconfig --add gmond
- /sbin/chkconfig --list gmond
- /etc/rc.d/init.d/gmond start
6. Configure the Globus Toolkit for Ganglia
We now configure MDS4 to analyze the gmond output.
The configuration of the Globus Toolkit for Ganglia depends on the version installed.
a) For Globus Toolkit version < 4.0.5
Edit the file
-
$GLOBUS_LOCATION/etc/globus_wsrf_mds_usefulrp/gluerp.xml
and replace the "defaultProvider" line with
<defaultProvider>java org.globus.mds.usefulrp.glue.GangliaElementProducer</defaultProvider>
|
b) For Globus Toolkit version ≥ 4.0.5
-
Globus 4.0.5 now uses the Resource Property Provider component of the UsefulRP subsystem to communicate Ganglia information. It comes with a tool mds-gluerp-configure to correctly configure the settings files.
-
mds-gluerp-configure none ganglia $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/ganglia-config.xml
Successfuly wrote configuration output file to: /usr/local/globus/gtk/etc/globus_wsrf_mds_index/ganglia-config.xml
-
mds-gluerp-configure fork ganglia $GLOBUS_LOCATION/etc/gram-service-Fork/gluerp-config.xml
Successfuly wrote configuration output file to: /usr/local/globus/gtk/etc/gram-service-Fork/gluerp-config.xml
-
7. Configure Globus MDS
To configure Globus for MDS, the file
$GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd
is edited. The following lines are inserted in the section "<globalConfiguration>":
<parameter name="logicalHost" value="myhost.domain.de"/> |
where
myhost.domain.de
is to be replaced by the Internet address of the machine that will run gmond.
For the MDS upload, in the file
$GLOBUS_LOCATION/etc/globus_wsrf_mds_index/hierarchy.xml
un-comment the existing commented-out section "<upstream>" and substitute its contents by
https://astrogrid-mds.aip.de:8443/wsrf/services/DefaultIndexService |
8. Possible problems
The automatic exchange of information with other computers using the UDP channel is sufficient for simple Ganglia usage, but is not sufficient for MDS4. And the machine should be grouped later using MDS4.
As an example, to prohibit communication with other gmond when problems with monitoring with MDS4 are encountered, edit the /etc/gmond.conf sections udp_send_channel and udp_recv_channel can be modified by changing the mcast_join addresses and the bind address to a value deviating from the value in the other gmond configurations.
In the future, advanced methods will be developed to monitor real clusters in order to produce complex gmond output that is processed without error by MDS4.
So at this time, more than one HOST entry may cause problems:
- telnet localhost 8649 | grep "HOST NAME=" | wc -l
should ideally be equal to 1.
9. Start Globus and test
Start the Globus container:
- /etc/init.d/globus restart
Here is an
example of the contents of the log file $GLOBUS_LOCATION/var/container.log after a correct setup of Globus for Ganglia.
MDS4 and Ganglia should be communicating. We can verify this with the following query:
- wsrf-query -a -z none -s https://127.0.0.1:8443/wsrf/services/DefaultIndexService
The answer may take a few seconds, but if MDS4 can analyze the Ganglia output correctly, we should receive information about the name of the computers and many details about the processor, main memory, disk space, operating system, load etc. If information is missing in the output, MDS4 has a problem. Possibly, one of the above mentioned problems might hinder output.
As an example, this is a fragment of the correct output of MDS4:
<ns11:AggregatorData> |
Note especially the strings Processor,
ProcessorLoad, MainMemory, OperatingSystem, Architecture, FileSystem. These are derived from Ganglia information read by MDS.
Finally, in the $GLOBUS_LOCATION/var/container.log file, it is normal to see after the SOAP services listing lines like
2008-06-17 12:14:46,580 INFO impl.DefaultIndexService [ServiceThread-43,processConfigFile:107] Reading default registration configuration from file: /usr/local/globus/gtk/etc/globus_wsrf_mds_index/hierarchy.xml
|



