CREAM monitoring¶
The CREAM CE site is monitored with the Nagios framework and a set of specific probes. The service can be tested with job submission through WMS or with direct submittion (i.e. using CREAM CLI). The probes developed for the CREAM service must be installed on a User Interface because they use the cream-cli commands to monitor the CREAM ce.
Requirements¶
The required packages for the CREAM CE probes are:
- python (version 2.7 or greater)
- python-ldap
- python-suds (version 0.3.5 or greater)
- openssl (version 0.9.8e-12 or greater)
- nagios-submit-conf (version 0.2 or greater)
- python-GridMon (version 1.1.10)
About the last two rpms they can be install using the EGI repository
CREAM-CE direct job submission metrics and WN probes¶
The following metrics are a restructured version of the existing ones and provide a better approach for probing a CREAM CE and its WNs:
cream_serviceInfo.py
- get CREAM CE service infocream_allowedSubmission.py
- check if the submission to the selected CREAM CE is allowedcream_jobSubmit.py
- submit a job directly to the selected CREAM CEcream_jobOutput.py
- submit a job directly to the selected CREAM CE and retrieve the output-sandboxWN-softver probe
- check middleware version on WN (via cream_jobOutput.py)WN-csh probe
- check if WN has csh (via cream_jobOutput.py)
All of them have been implemented in python and are based on the cream-clicommands. They share the same logic structure and provide useful information about their version, usage (i.e. help) including the options list and their meaning, according to the guide Probes Development. For example:
$ ./cream_serviceInfo.py
Usage: cream_serviceInfo.py [options]
cream_serviceInfo.py: error: Specify either option -u URL or option -H HOSTNAME (and -p PORT) or read the help (-h)
$ ./cream_serviceInfo.py --help
Usage: cream_serviceInfo.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-x PROXY, --proxy=PROXY
The proxy path
-t TIMEOUT, --timeout=TIMEOUT
Probe execution time limit. [default:
120 sec]
-v, --verbose verbose mode [default: False]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]
$ ./cream_serviceInfo.py --version
cream_serviceInfo v.1.1
The interaction with the CREAM CE requires the use of a valid VOMS proxy expressed by the X509_USER_PROXY
env variable or through the --proxy
option. All metrics check the existence of the proxy file and calculate the time left. In case of error, the related error message will be thrown:
$ ./cream_serviceInfo.py --hostname cream-41.pd.infn.it --port 8443 --proxy /tmp/x509up_u0 --verbose
Proxy file not found or not readable
The verbose mode (--verbose
) could be enabled to each metric. It provides several details about the probe execution itself by highlighting the internal commands:
$ ./cream_serviceInfo.py --hostname prod-ce-01.pd.infn.it --port 8443 -x /tmp/x509up_u733 --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
invoking service info
executing command: /usr/bin/glite-ce-service-info prod-ce-01.pd.infn.it:8443
CREAM serviceInfo OK: Service Version = [1.16.4 - EMI version: 3.15.0-1.el6]
In case of mistakes on the selected options or on their values, the probe tries to explain what is wrong. For example the cream_serviceInfo doesn’t support the --queue
option:
$ ./cream_serviceInfo.py --hostname prod-ce-01.pd.infn.it --port 8443 --queue creamtest1 -x /tmp/x509up_u733 --verbose
Usage: cream_serviceInfo.py [options]
cream_serviceInfo.py: error: no such option: --queue
In case of the errors in interacting with the CREAM CE, useful details will be provided about the failure:
$ ./cream_allowedSubmission.py --url https://prod-ce-01.pd.infn.it:8443 -x /tmp/x509up_u733
command '/usr/bin/glite-ce-allowed-submission cream-43.pd.infn.it:8443' failed: return_code=1
details: ['2019-12-13 15:59:57,085 FATAL - Received NULL fault; the error is due to another cause: FaultString=[connection error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail=[Connection refused]\n']
Sources are available in github
cream_serviceInfo.py¶
The serviceInfo.py retrieves information about the status of the CREAM CE. The help shows how the probe must be invoked:
$ ./cream_serviceInfo.py --help
Usage: cream_serviceInfo.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-x PROXY, --proxy=PROXY
The proxy path
-t TIMEOUT, --timeout=TIMEOUT
Probe execution time limit. [default:
120 sec]
-v, --verbose verbose mode [default: False]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]
In order to get information about the CREAM service on the host https://prod-ce-01.pd.infn.it:8443, use the following command:
$ ./cream_serviceInfo.py --url https://prod-ce-01.pd.infn.it:8443 -x /tmp/x509up_u733
CREAM serviceInfo OK: Service Version = [1.16.4 - EMI version: 3.15.0-1.el6]
or similary:
$ ./cream_serviceInfo.py --hostname prod-ce-01.pd.infn.it --port 8443 -x /tmp/x509up_u733
CREAM serviceInfo OK: Service Version = [1.16.4 - EMI version: 3.15.0-1.el6]
cream_allowedSubmission.py¶
This is a simple metric which checks if the submission to the selected CREAM CE is allowed. Its usage is analogous to the above metric:
$ ./cream_allowedSubmission.py --help
Usage: cream_allowedSubmission.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-x PROXY, --proxy=PROXY
The proxy path
-t TIMEOUT, --timeout=TIMEOUT
Probe execution time limit. [default:
120 sec]
-v, --verbose verbose mode [default: False]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]
Notice: the use of the --url
option is equivalent to specify both
the options: --hostname
and --port
:
$ ./cream_allowedSubmission.py --hostname prod-ce-01.pd.infn.it --port 8443 -x /tmp/x509up_u733
CREAM allowedSubmission OK: the job submission is ENABLED
$ ./cream_allowedSubmission.py --url https://prod-ce-01.pd.infn.it:8443 -x /tmp/x509up_u733
CREAM allowedSubmission OK: the job submission is ENABLED
The verbose mode highlights the internal commands:
$ ./cream_allowedSubmission.py --url https://prod-ce-01.pd.infn.it:8443 -x /tmp/x509up_u733 --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
invoking allowedSubmission
executing command: /usr/bin/glite-ce-allowed-submission prod-ce-01.pd.infn.it:8443
CREAM allowedSubmission OK: the job submission is ENABLED
cream_jobSubmit.py¶
This metric submits a job directly to the selected CREAM CE and waits until the job termination by providing the final status. Finally the job is purged. This probe does not test the output-sandbox retrieval.
$ ./cream_jobSubmit.py --help
Usage: cream_jobSubmit.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-x PROXY, --proxy=PROXY
The proxy path
-t TIMEOUT, --timeout=TIMEOUT
Probe execution time limit. [default:
120 sec]
-v, --verbose verbose mode [default: False]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]/cream-<lrms>-<queue>
-l LRMS, --lrms=LRMS The LRMS name (e.g.: 'lsf', 'pbs' etc)
-q QUEUE, --queue=QUEUE
The queue name (e.g.: 'creamtest')
-j JDL, --jdl=JDL The jdl path
The --url
(-u
) directive must be used to target the probe to a specific CREAM CE identified by its identifier (i.e. CREAM CE ID). Alternatively is it possible to specify the CREAM CE identifier by using the --hostname
, --port
, --lrms
and --queue
options which are mutually exclusive with respect to the --url
option.
Consider the JDL file hostname.jdl with the following content:
$ cat ./hostname.jdl
[
Type="Job";
JobType="Normal";
Executable = "/bin/hostname";
Arguments = "-s";
StdOutput = "std.out";
StdError = "std.err";
OutputSandbox = {"std.out","std.err"};
OutputSandboxBaseDestUri="gsiftp://localhost";
]
If verbose mode is disabled, the output should look like this:
$ ./cream_jobSubmit.py --url https://prod-ce-01.pd.infn.it:8443/cream-lsf-grid -x /tmp/x509up_u733 --jdl ./hostname.jdl
CREAM JobSubmit OK [DONE-OK]
Notice: the use of the --url
option is equivalent to specify both the options: --hostname
, --port
--lrms
and –queue:
$ ./cream_jobSubmit.py --hostname prod-ce-01.pd.infn.it --port 8443 --lrms lsf --queue grid -x /tmp/x509up_u733 --jdl ./hostname.jdl
CREAM JobSubmit OK [DONE-OK]
If the verbose mode is enabled, the output of the above command should be like this:
$ ./cream_jobSubmit.py --hostname prod-ce-01.pd.infn.it --port 8443 --lrms lsf --queue grid -x /tmp/x509up_u733 --jdl ./hostname.jdl --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
executing command: /usr/bin/glite-ce-job-submit -d -a -r prod-ce-01.pd.infn.it:8443/cream-lsf-grid ./hostname.jdl
['2019-12-13 13:54:33,247 DEBUG - Using certificate proxy file [/tmp/x509up_u733]\n', '2019-12-13 13:54:33,279 DEBUG - VO from certificate=[enmr.eu]\n', '2019-12-13 13:54:33,279 WARN - No configuration file suitable for loading. Using built-in configuration\n', '2019-12-13 13:54:33,279 DEBUG - Logfile is [/tmp/glite_cream_cli_logs/glite-ce-job-submit_CREAM_zangrand_20191213-135433.log]\n', '2019-12-13 13:54:33,282 INFO - certUtil::generateUniqueID() - Generated DelegationID: [12815a52a76431b1712199d87ae5896fd6718b3a]\n', '2019-12-13 13:54:36,175 DEBUG - Registering to [https://prod-ce-01.pd.infn.it:8443/ce-cream/services/CREAM2] JDL=[ StdOutput = "std.out"; BatchSystem = "lsf"; QueueName = "grid"; Executable = "/bin/hostname"; Type = "Job"; Arguments = "-s"; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "std.out","std.err" }; StdError = "std.err" ] - JDL File=[./hostname.jdl]\n', '2019-12-13 13:54:36,634 DEBUG - Will invoke JobStart for JobID [CREAM067861520]\n', 'https://prod-ce-01.pd.infn.it:8443/CREAM067861520\n']
job id: https://prod-ce-01.pd.infn.it:8443/CREAM067861520
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://prod-ce-01.pd.infn.it:8443/CREAM067861520
['\n', '****** JobID=[https://prod-ce-01.pd.infn.it:8443/CREAM067861520]\n', '\tStatus = [DONE-OK]\n', '\tExitCode = [0]\n', '\n', '\n']
exitCode= ExitCode = [0]
job status: DONE-OK
invoking jobPurge
executing command: /usr/bin/glite-ce-job-purge --noint https://prod-ce-01.pd.infn.it:8443/CREAM067861520
CREAM JobSubmit OK [DONE-OK]
cream_jobOutput.py¶
This metric extends the cream_jobSubmit.py functionality by retrieving the job’s output-sandbox. Both the stage-in and stage-out phases are both performed automatically by the CE. In particular the stage-out needs the OutputSandboxBaseDestUri="gsiftp://localhost"
set in the JDL. Finally the job is purged.
$ ./cream_jobOutput.py --help
Usage: cream_jobOutput.py [options]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
The hostname of the CREAM service.
-p PORT, --port=PORT The port of the service. [default: none]
-x PROXY, --proxy=PROXY
The proxy path
-t TIMEOUT, --timeout=TIMEOUT
Probe execution time limit. [default:
120 sec]
-v, --verbose verbose mode [default: False]
-u URL, --url=URL The status endpoint URL of the service. Example:
https://<host>[:<port>]/cream-<lrms>-<queue>
-l LRMS, --lrms=LRMS The LRMS name (e.g.: 'lsf', 'pbs' etc)
-q QUEUE, --queue=QUEUE
The queue name (e.g.: 'creamtest')
-j JDL, --jdl=JDL The jdl path
-d DIR, --dir=DIR The output sandbox path
The options are the same as cream_jobSubmit.py except for --dir
. Such option allows the user to specify the path where the output-sandbox has to be stored temporarily. The default value is /var/lib/argo-monitoring/eu.egi.CREAMCE
.
Consider the JDL file hostname.jdl with the following content:
$ cat ./hostname.jdl
[
Type="Job";
JobType="Normal";
Executable = "/bin/hostname";
Arguments = "-s";
StdOutput = "std.out";
StdError = "std.err";
OutputSandbox = {"std.out","std.err"};
OutputSandboxBaseDestUri="gsiftp://localhost";
]
If verbose mode is disabled, the output should look like this:
$ ./cream_jobOutput.py --hostname prod-ce-01.pd.infn.it --port 8443 --lrms lsf --queue grid -x /tmp/x509up_u733 --dir /tmp --jdl ./hostname.jdl
CREAM JobOutput OK | retrieved outputSandbox: ['std.err', 'std.out']
**** std.err ****
**** std.out ****
prod-wn-038
Notice: the use of the --dir
and the output-sandbox content returned in the output message.
If the verbose mode is enabled, the output of the above command should be like this:
$ ./cream_jobOutput.py --hostname prod-ce-01.pd.infn.it --port 8443 --lrms lsf --queue grid -x /tmp/x509up_u733 --dir /tmp --jdl ./hostname.jdl --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
executing command: /usr/bin/glite-ce-job-submit -d -a -r prod-ce-01.pd.infn.it:8443/cream-lsf-grid ./hostname.jdl
['2019-12-13 14:02:55,478 DEBUG - Using certificate proxy file [/tmp/x509up_u733]\n', '2019-12-13 14:02:55,519 DEBUG - VO from certificate=[enmr.eu]\n', '2019-12-13 14:02:55,520 WARN - No configuration file suitable for loading. Using built-in configuration\n', '2019-12-13 14:02:55,520 DEBUG - Logfile is [/tmp/glite_cream_cli_logs/glite-ce-job-submit_CREAM_zangrand_20191213-140255.log]\n', '2019-12-13 14:02:55,523 INFO - certUtil::generateUniqueID() - Generated DelegationID: [b6b895d69f7ef0d438db82930476a2fd149d0501]\n', '2019-12-13 14:02:57,610 DEBUG - Registering to [https://prod-ce-01.pd.infn.it:8443/ce-cream/services/CREAM2] JDL=[ StdOutput = "std.out"; BatchSystem = "lsf"; QueueName = "grid"; Executable = "/bin/hostname"; Type = "Job"; Arguments = "-s"; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "std.out","std.err" }; StdError = "std.err" ] - JDL File=[./hostname.jdl]\n', '2019-12-13 14:02:58,271 DEBUG - Will invoke JobStart for JobID [CREAM160637101]\n', 'https://prod-ce-01.pd.infn.it:8443/CREAM160637101\n']
job id: https://prod-ce-01.pd.infn.it:8443/CREAM160637101
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://prod-ce-01.pd.infn.it:8443/CREAM160637101
['\n', '****** JobID=[https://prod-ce-01.pd.infn.it:8443/CREAM160637101]\n', '\tStatus = [IDLE]\n', '\n', '\n']
job status: IDLE
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://prod-ce-01.pd.infn.it:8443/CREAM160637101
['\n', '****** JobID=[https://prod-ce-01.pd.infn.it:8443/CREAM160637101]\n', '\tStatus = [DONE-OK]\n', '\tExitCode = [0]\n', '\n', '\n']
exitCode= ExitCode = [0]
job status: DONE-OK
invoking getOutputSandbox
executing command: /usr/bin/glite-ce-job-output --noint --dir /tmp https://prod-ce-01.pd.infn.it:8443/CREAM160637101
output sandbox dir: /tmp/prod-ce-01.pd.infn.it_8443_CREAM160637101
invoking jobPurge
executing command: /usr/bin/glite-ce-job-purge --noint https://prod-ce-01.pd.infn.it:8443/CREAM160637101
CREAM JobOutput OK | retrieved outputSandbox: ['std.err', 'std.out']
**** std.err ****
**** std.out ****
prod-wn-038
WN-softver probe¶
This probe checks the middleware version on a WN managed by the CREAM-CE. It makes use of cream_jobOutput.py in the following way:
$ ./cream_jobOutput.py --url https://prod-ce-01.pd.infn.it:8443/cream-lsf-grid -x /tmp/x509up_u733 --dir /tmp -j ./WN-softver.jdl
CREAM JobOutput OK | retrieved outputSandbox: ['std.err', 'std.out']
**** std.err ****
**** std.out ****
prod-wn-014 has EMI 3.15.0-1.el6
where
$ cat WN-softver.jdl
[
Type="Job";
JobType="Normal";
Executable = "WN-softver.sh";
#Arguments = "a b c";
StdOutput = "std.out";
StdError = "std.err";
InputSandbox = {"WN-softver.sh"};
OutputSandbox = {"std.out","std.err"};
OutputSandboxBaseDestUri="gsiftp://localhost";
]
and WN-softver.sh is attached.
The verbose option enabled gives the following output:
$ ./cream_jobOutput.py --url https://prod-ce-01.pd.infn.it:8443/cream-lsf-grid -x /tmp/x509up_u733 --dir /tmp -j ./WN-softver.jdl --verbose
executing command: /usr/bin/voms-proxy-info -timeleft
executing command: /usr/bin/glite-ce-job-submit -d -a -r prod-ce-01.pd.infn.it:8443/cream-lsf-grid ./WN-softver.jdl
['2019-12-13 14:06:25,768 DEBUG - Using certificate proxy file [/tmp/x509up_u733]\n', '2019-12-13 14:06:25,804 DEBUG - VO from certificate=[enmr.eu]\n', '2019-12-13 14:06:25,805 WARN - No configuration file suitable for loading. Using built-in configuration\n', '2019-12-13 14:06:25,805 DEBUG - Logfile is [/tmp/glite_cream_cli_logs/glite-ce-job-submit_CREAM_zangrand_20191213-140625.log]\n', '2019-12-13 14:06:25,805 DEBUG - Processing file [/users/cms/zangrand/cream-nagios-master/src/WN-softver.sh]...\n', '2019-12-13 14:06:25,805 DEBUG - Inserting mangled InputSandbox in JDL: [{"/users/cms/zangrand/cream-nagios-master/src/WN-softver.sh"}]...\n', '2019-12-13 14:06:25,806 INFO - certUtil::generateUniqueID() - Generated DelegationID: [7f0ac5ec8a7deefa01f207c0b341fce1568f5282]\n', '2019-12-13 14:06:27,612 DEBUG - Registering to [https://prod-ce-01.pd.infn.it:8443/ce-cream/services/CREAM2] JDL=[ StdOutput = "std.out"; BatchSystem = "lsf"; QueueName = "grid"; Executable = "WN-softver.sh"; Type = "Job"; JobType = "Normal"; OutputSandboxBaseDestUri = "gsiftp://localhost"; OutputSandbox = { "std.out","std.err" }; InputSandbox = { "/users/cms/zangrand/cream-nagios-master/src/WN-softver.sh" }; StdError = "std.err" ] - JDL File=[./WN-softver.jdl]\n', '2019-12-13 14:06:28,228 DEBUG - JobID=[https://prod-ce-01.pd.infn.it:8443/CREAM608273414]\n', '2019-12-13 14:06:28,228 DEBUG - UploadURL=[gsiftp://prod-ce-01.pd.infn.it/var/cream_sandbox/enmr/CN_Marco_Verlato_verlato_infn_it_O_Istituto_Nazionale_di_Fisica_Nucleare_C_IT_DC_tcs_DC_terena_DC_org_enmr_eu_Role_NULL_Capability_NULL_enmr018/60/CREAM608273414/ISB]\n', '2019-12-13 14:06:28,230 INFO - Sending file [gsiftp://prod-ce-01.pd.infn.it/var/cream_sandbox/enmr/CN_Marco_Verlato_verlato_infn_it_O_Istituto_Nazionale_di_Fisica_Nucleare_C_IT_DC_tcs_DC_terena_DC_org_enmr_eu_Role_NULL_Capability_NULL_enmr018/60/CREAM608273414/ISB/WN-softver.sh]\n', '2019-12-13 14:06:28,482 DEBUG - Will invoke JobStart for JobID [CREAM608273414]\n', 'https://prod-ce-01.pd.infn.it:8443/CREAM608273414\n']
job id: https://prod-ce-01.pd.infn.it:8443/CREAM608273414
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://prod-ce-01.pd.infn.it:8443/CREAM608273414
['\n', '****** JobID=[https://prod-ce-01.pd.infn.it:8443/CREAM608273414]\n', '\tStatus = [REALLY-RUNNING]\n', '\n', '\n']
job status: REALLY-RUNNING
invoking jobStatus
executing command: /usr/bin/glite-ce-job-status https://prod-ce-01.pd.infn.it:8443/CREAM608273414
['\n', '****** JobID=[https://prod-ce-01.pd.infn.it:8443/CREAM608273414]\n', '\tStatus = [DONE-OK]\n', '\tExitCode = [0]\n', '\n', '\n']
exitCode= ExitCode = [0]
job status: DONE-OK
invoking getOutputSandbox
executing command: /usr/bin/glite-ce-job-output --noint --dir /tmp https://prod-ce-01.pd.infn.it:8443/CREAM608273414
output sandbox dir: /tmp/prod-ce-01.pd.infn.it_8443_CREAM608273414
invoking jobPurge
executing command: /usr/bin/glite-ce-job-purge --noint https://prod-ce-01.pd.infn.it:8443/CREAM608273414
CREAM JobOutput OK | retrieved outputSandbox: ['std.err', 'std.out']
**** std.err ****
**** std.out ****
prod-wn-014 has EMI 3.15.0-1.el6
WN-csh probe¶
This probe checks that csh is there on a WN managed by the CREAM-CE. It makes use of cream_jobOutput.py in the following way:
$ ./cream_jobOutput.py --url https://prod-ce-01.pd.infn.it:8443/cream-lsf-grid -x /tmp/x509up_u733 --dir /tmp -j ./WN-csh.jdl
CREAM JobOutput OK | retrieved outputSandbox: ['std.err', 'std.out']
**** std.err ****
**** std.out ****
prod-wn-016 has csh
where
$ cat WN-csh.jdl
[
Type="Job";
JobType="Normal";
Executable = "WN-csh.sh";
#Arguments = "a b c";
StdOutput = "std.out";
StdError = "std.err";
InputSandbox = {"WN-csh.sh"};
OutputSandbox = {"std.out","std.err"};
OutputSandboxBaseDestUri="gsiftp://localhost";
]
and WN-csh.sh is attached.
Deployment example¶
In a Nagios server version 3.5.0 testing instance, we deployed the files needed to execute the probes described above in the following directories:
$ ls -l /usr/libexec/argo-monitoring/probes/eu.egi.CREAMCE/
total 48
-rwxr-xr-x 1 root root 1361 Jan 30 16:58 cream_allowedSubmission.py
-rwxr-xr-x 1 root root 2972 Jan 31 12:42 cream_jobOutput.py
-rwxr-xr-x 1 root root 2972 Jan 31 12:42 cream_jobSubmit.py
-rwxr-xr-x 1 root root 1416 Jan 31 12:42 cream_serviceInfo.py
drwxr-xr-x 2 root root 4096 Jan 31 11:34 cream_cli
and
$ ls -l /etc/nagios/plugins/eu.egi.CREAMCE/
total 16
-rw-r--r-- 1 root root 213 Jan 29 14:26 hostname.jdl
-rw-r--r-- 1 root root 129 Jan 30 16:21 sleep.jdl
-rw-r--r-- 1 root root 292 Jan 31 11:34 WN-csh.jdl
-rwxr-xr-x 1 root root 603 Jan 31 11:34 WN-csh.sh
-rw-r--r-- 1 root root 300 Jan 31 11:34 WN-softver.jdl
-rwxr-xr-x 1 root root 1144 Jan 31 11:34 WN-softver.sh
and defined the new services adding in the file
/etc/nagios/objects/services.cfg
the following lines:
define service{
use local-service
host_name prod-ce-01.pd.infn.it
service_description eu.egi.CREAMCE-AllowedSubmission
check_command ncg_check_native!/usr/libexec/argo-monitoring/probes/eu.egi.CREAMCE/cream_allowedSubmission.py!60!-x /tmp/x509up_u733 -p 8443
normal_check_interval 6
retry_check_interval 3
max_check_attempts 2
obsess_over_service 0
}
define service{
use local-service
host_name prod-ce-01.pd.infn.it
service_description eu.egi.CREAMCE-ServiceInfo
check_command ncg_check_native!/usr/libexec/argo-monitoring/probes/eu.egi.CREAMCE/cream_serviceInfo.py!60!-x /tmp/x509up_u733 -p 8443
normal_check_interval 6
retry_check_interval 3
max_check_attempts 2
obsess_over_service 0
}
define service{
use local-service
host_name prod-ce-01.pd.infn.it
service_description eu.egi.CREAMCE-JobSubmit
check_command ncg_check_native!/usr/libexec/argo-monitoring/probes/eu.egi.CREAMCE/cream_jobSubmit.py!60!-x /tmp/x509up_u733 -p 8443 -l lsf
-q creamtest1 -j /etc/nagios/plugins/eu.egi.CREAMCE/hostname.jdl
normal_check_interval 6
retry_check_interval 3
max_check_attempts 2
obsess_over_service 0
}
define service{
use local-service
host_name prod-ce-01.pd.infn.it
service_description eu.egi.CREAMCE.WN-Softver
check_command ncg_check_native!/usr/libexec/argo-monitoring/probes/eu.egi.CREAMCE/cream_jobOutput.py!60!-x /tmp/x509up_u733 -p 8443 -l lsf
-q creamtest1 -j /etc/nagios/plugins/eu.egi.CREAMCE/WN-softver.jdl
normal_check_interval 6
retry_check_interval 3
max_check_attempts 2
obsess_over_service 0
}
define service{
use local-service
host_name prod-ce-01.pd.infn.it
service_description eu.egi.CREAMCE.WN-Csh
check_command ncg_check_native!/usr/libexec/argo-monitoring/probes/eu.egi.CREAMCE/cream_jobOutput.py!60!-x /tmp/x509up_u733 -p 8443 -l lsf
-q creamtest1 -j /etc/nagios/plugins/eu.egi.CREAMCE/WN-csh.jdl
normal_check_interval 6
retry_check_interval 3
max_check_attempts 2
obsess_over_service 0
}
The check_command ncg_check_native was defined in the file
/etc/nagios/objects/commands.cfg
as below:
define command{
command_name ncg_check_native
command_line $ARG1$ -H $HOSTNAME$ -t $ARG2$ $ARG3$
}