Work Package VI: Monitoring and Interactive Steering of Grid Jobs
The simulation codes used within our Astronomy Community (Cactus, Gadget, Nbody6, NIRVANA, MLAP) often provide some application-specific methods already for monitoring and steering of parallel jobs at runtime and for real-time visualisation of intermediate results via live data streaming. Although such methods may be based on standard protocols (eg. HTTP, streamed HDF5 over sockets), up to now they usually are embedded in local environments only and thus not per se also grid-compatible.
In Work Package AP-VI we want to adopt existing and/or develop new application
monitoring and steering methods for our astrophysical simulation codes
which enable users to control their jobs interactively, no matter whether they
are running in a local environment or somewhere on the Grid.
Grid Monitoring requires each job to register itself with a job metadata
management system (AP-III). A unique job identifier encodes the execution
location of each job, along with a description and announcement of all the
application-specific monitoring/steering/data streaming services it provides.
Also necessary are mechanisms to dynamically open control connections into
running Grid jobs, taking into account any given site-specific network
administration issues (eg. client machines hidden behind fire walls, cluster
compute nodes managed via a virtual private network).
Connections can be established both interactively by individual users or
automatically by a remote control process; in any case the protocols for
such peer-to-peer connections should be designed as generic
as possible and implemented transparently by adopting the file-based
access methods (as being developed in AP-III) and use standard user and Grid
application programming interfaces (as provided by AP-VII).
Organisational Structure
Partners: AEI, AIP, ARI, MPIA
Work Package Manager: Thomas Radke (AEI)
Technical Contact Partners:
- Thomas Radke (AEI): Cactus
- Volker Springel (MPA), Gadget
- Udo Ziegler (AIP), Nirvana
- Alexander Knebe (AIP), MLAP
- Rainer Spurzem (ARI), Nbody6
- Wolfgang Hovest (MPA), ProC
Work Schedule
-
Analysis of existing monitoring and steering methods
Existing monitoring and steering methods implemented in today's Astrophysics simulation codes are compared and analysed with respect to potential restrictions for their use in a Grid environment.
-
Design of grid-enabled monitoring and steering methods
According to specific requirements in a Grid environment, generalised grid-enabled data access protocols are designed. They will serve as the basis for the implementation of grid-compatible monitoring and steering methods for Grid simulations.
-
Implementation of grid-enabled data access methods for application monitoring
The designed data access protocols are implemented as a suite of prototyped grid-enabled methods for a selection of Astrophysics simulation codes to monitor simple Grid jobs with limited functionality (job monitoring in selected Grid scenarios). The implementation may use proprietary application-specific user and programming interfaces.
-
Generalised version of data access methods
Practical experiences from a thorough test phase of the initial prototype of monitoring methods are used to implement a second version. This version will make use of generic interfaces so that it can be easily integrated in other simulation codes also. Necessary functionality to support distributed grid simulations is implemented.
-
Completion of monitoring methods with steering functionality
The developed monitoring methods are enhanced by feedback capabilities in order to implement interactive steering functionality. The generalised monitoring and steering methods are integrated in Astrophysics simulation codes and tested in various Grid scenarios.
-
Debugging and optimisation of developed Grid middleware



