ppss/wiki/Manual2.wiki

#summary PPSS Manual (Distributed)
#labels Featured

= Introduction =

To use PPSS in a distributed fasion, The following steps must be performed:

 # Setup SSH access on server and nodes.
 # Create a list of all nodes. 
 # Create a configuration file for PPSS, that will be distributed to nodes.
 # Deploy PPSS to the nodes.
 # Start PPSS on all nodes. 
 
== Preparation of server and nodes ==

The following preparations must be made in order to use PPSS in a distributed fasion:

  # Create an unprivileged user 'ppss' on the server.
  # Create an unprivileged user 'ppss' on each node.
  # Generate a SSH key without a pass phrase.
  # Add the SSH key to the authorized_keys file of the 'ppss' user on the server.
  # Add the SSH key to the authorized_keys file of the 'ppss' user on the client.  
  # Place PPSS on the server within the PPSS home directory. 

*Security* 
Please note that usage of SSH keys without pass phrases may pose a security threat if the machines are shared with other users. You must decide for yourself if the security risk that is associated with this setup is acceptable for your environment. For example, if a node is compromised, the attacker will have (initially unprivileged) access to the server.

== Create a list of nodes == 

A file must be created containing the hostnames (DNS) and/or IP-addresses of all nodes. The file must contain one node per line, such as:

{{{
192.168.0.100
192.168.0.101
host.domain.com
...
}}}

== Create a PPSS configuration file ==

This is the most important part of setting up distributed PPSS. It is exactly the same as setting up a configuration file for standalone mode, except that more options are necessary. 

The best way to explain how to create a configuration file for distributed PPSS is to provide an example. In this example, a script is used to encode WAV files to MP3. This script is called 'encode.sh' and takes a filename as an argument. 

`./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir -s 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh -n nodes.txt  -t -o /some/output/dir` 

It is quite a long command line, however, it is executed only once. Afther that, the config file config.cfg can be used for all further commands. 

*Mode*

The first option sets the mode, in this case 'config' to generate a configuration file. 

*Configuration file*

The second option, -C, specifies the name of the configuration file to be created.

*Command*

The third option, -c,  specifies the command to be executed. *Please take special note of the single quotes and the space behind the commad.* You can read -c 'encode.sh ' also as -c 'encode.sh "$ITEM"'.

*Source directory*

This option specifies the location on the *server* where the files reside that must be processed. These files will be transfered using SCP to the nodes for local processing. 

*Server*

The -s option specifies the SSH server that acts as both fileserver and SSH server for communication between nodes.  The SSH server is mainly used for file-locking: nodes know that locked files are already processed or being processed, so another unlocked file must be selected.

If the server acts both as a file server and SSH server, it is not recommended to use it also as a node, in this case for encoding. Filetransers using SSH can take quite some processing power. 

*User name*
This is the name of the local system user that is used by the nodes to logon to the server. For deployment, such a user must also be present on the nodes.

*SSH Key*

Scripts using SSH require an SSH key withouth a passphrase. This key must be uploaded to the nodes an the nodes must know which key to use, so it must be specified. 

*Script or program that must be uploaded*

The -S option specifies the script or program that should be uploaded to the node because it must be executed by the node for distributed computing. In this case, the encode.sh script must be deployed on all nodes and thus specified. 

*List of nodes*

The -n option specifies the file containing all nodes. For every node, PPSS will perform actions such as deploy, start, stop and pause.

*Transfer files to local host*

If this option is specified, the file is copied from the source directory to a local temporary working directory for local processing. This is necessary if SCP is used to access files that must be processed. 

If files are distributed over NFS or SMB, the files seem to be present on the local system, because it is just a mount point and thus just a part of the local file system. In this case, the -t option can be omitted, however it it is specified, files are copied to a local directory using 'cp'. 

*The output directory*

If the -t option is used, the -o option specifies the destination directory on the server. The results are uploaded to this directory. If the -t option is not specified, the command 'cp' is used to transfer files back to the specified output directory. 

== Deploy PPSS to nodes ==

Once SSH access is setup and the configuration file is generated, PPSS can be deployed to the nodes. This is very simple, as this example demonstrates:

`./ppss.sh deploy -C config.cfg

During the phase when we generated the configuration file, a nodes file was specified. Thus PPSS knows, just by reading this configuration file, which file contains a list of nodes.

{{{
bash-3.2$ ./ppss.sh deploy -C config.cfg
mrt 12 22:18:22: INFO   - ---------------------------------------------------------
mrt 12 22:18:22: INFO   - Distributed Parallel Processing Shell Script version 2.03
mrt 12 22:18:22: INFO   - Hostname: MacBoek.local
mrt 12 22:18:22: INFO   - Deploying PPSS on nodes.
mrt 12 22:18:24: INFO   - PPSS installed on node 192.168.1.14.
mrt 12 22:18:28: INFO   - PPSS installed on node 192.168.1.12.
mrt 12 22:18:29: INFO   - PPSS installed on node 192.168.1.4.
mrt 12 22:18:31: INFO   - PPSS installed on node 192.168.1.31.
}}}

Deployment of PPSS is serialised, not parallel. Maybe it should be performed in parallel. This is a to-do.

== Start PPSS on nodes ==

Just as simple as deploying PPSS, PPSS is started on all nodes.

`./ppss.sh start -C config`

{{{
mrt 12 22:21:17: INFO   - ---------------------------------------------------------
mrt 12 22:21:17: INFO   - Distributed Parallel Processing Shell Script version 2.03
mrt 12 22:21:17: INFO   - Hostname: MacBoek.local
mrt 12 22:21:17: INFO   - Starting PPSS on node 10.0.0.14.
mrt 12 22:21:17: INFO   - Starting PPSS on node 10.0.0.12.
mrt 12 22:21:20: INFO   - Starting PPSS on node 10.0.0.4.
mrt 12 22:21:20: INFO   - Starting PPSS on node 10.0.0.31.
}}}

== Stop pause and continue PPSS on nodes ==

To stop, pause or continue processing on all nodes, use the following commands:

`./ppss.sh stop -C config.cfg`
`./ppss.sh pause -C config.cfg`
`./ppss.sh continue -C config.cfg`

Please note that nodes will continue processing the current item they are working on, they just stop processing new items if stop or pause is selected.

== Show progress ==

The overall process of the 'cluster' is determined by the number of files present in the input and output directories on the server.

{{{

bash-3.2$ ./ppss.sh status -C config.cfg
mrt 12 21:06:17: INFO   - ---------------------------------------------------------
mrt 12 21:06:17: INFO   - Distributed Parallel Processing Shell Script version 2.03
mrt 12 21:06:17: INFO   - Hostname: MacBoek.local
mrt 12 21:06:17: INFO   - 56 percent complete.

}}}

== Logging ==

An important feature of PPSS is its extensive logging. There are two types of log files. 

  * A single log file created by PPSS itself. This file is found on the local nodes. Using tail -f on these files, it is possible to monitor what PPSS is currently doing. 

{{{
mrt 10 23:51:15: INFO   - ---------------------------------------------------------
mrt 10 23:51:15: INFO   - Distributed Parallel Processing Shell Script version 2.03
mrt 10 23:51:15: INDO   - ---------------------------------------------------------
mrt 10 23:51:15: INFO   - Hostname: MacBoek.local
mrt 10 23:51:15: DEBUG  - Found 2 logic processors.
mrt 10 23:51:15: INFO   - ---------------------------------------------------------
mrt 10 23:51:17: DEBUG  - Job log directory JOB_LOG exists.
mrt 10 23:51:18: INFO   - Listener started.
mrt 10 23:51:18: INFO   - Starting 2 workers.
mrt 10 23:51:18: INFO   - Currently 0 percent complete. Processed 0 of 625 items.
mrt 10 23:51:18: DEBUG  - Trying to lock item 20060731.wav.
mrt 10 23:51:18: DEBUG  - Item 20060731.wav is locked.
mrt 10 23:51:18: INFO   - Currently 0 percent complete. Processed 1 of 625 items.
mrt 10 23:51:18: DEBUG  - Trying to lock item 20060801.wav.
mrt 10 23:51:18: DEBUG  - Item 20060801.wav is locked.
mrt 10 23:51:18: INFO   - Currently 0 percent complete. Processed 2 of 625 items.
mrt 10 23:51:18: DEBUG  - Trying to lock item 20060802.wav.
mrt 10 23:51:18: DEBUG  - Item 20060802.wav is locked.
mrt 10 23:51:18: INFO   - Currently 0 percent complete. Processed 3 of 625 items.
mrt 10 23:51:18: DEBUG  - Trying to lock item 20060803.wav.
mrt 10 23:51:18: DEBUG  - Item 20060803.wav is locked.
............
mrt 10 23:51:23: DEBUG  - Item 20060830.wav is locked.
mrt 10 23:51:23: INFO   - Currently 3 percent complete. Processed 23 of 625 items.
mrt 10 23:51:23: DEBUG  - Trying to lock item 20060831.wav.
mrt 10 23:51:23: DEBUG  - Got lock on 20060831.wav, processing.
mrt 10 23:51:23: DEBUG  - Transfering item 20060831.wav to local disk.
mrt 10 23:52:18: DEBUG  - Exit code of transfer is 0
mrt 10 23:52:18: DEBUG  - Processing item 20060831.wav

}}}

  * An individual log file containing information and output of each processed item. these files are uploaded to the SSH server to the 'job_log' directory. For every item, a log file must be present.

{{{
===== PPSS Item Log File =====
Host:		MacBoek.local
Item:		PPSS_LOCAL_TMPDIR/20060831.wav
Start date:	mrt 10 23:52:18

Encode of PPSS_LOCAL_TMPDIR/20060831.wav successful.

Status:		Succes - item has been processed.
Elapsed time (h:m:s): 0:5:23

}}}
 
As you can see, with a few simple grep commands, it is possible to quickly determine which items have failed to process.
Edited wiki page through web user interface. 2009-03-10 08:52:50 +00:00			`#summary PPSS Manual (Distributed)`
Edited wiki page through web user interface. 2009-03-12 21:25:39 +00:00			`#labels Featured`
Created wiki page through web user interface. 2009-03-10 08:50:58 +00:00
			`= Introduction =`

Edited wiki page through web user interface. 2009-03-10 11:46:05 +00:00			`To use PPSS in a distributed fasion, The following steps must be performed:`
Created wiki page through web user interface. 2009-03-10 08:50:58 +00:00
Edited wiki page through web user interface. 2009-03-10 11:48:07 +00:00			`# Setup SSH access on server and nodes.`
Edited wiki page through web user interface. 2009-03-12 21:12:43 +00:00			`# Create a list of all nodes.`
Edited wiki page through web user interface. 2009-03-10 11:46:05 +00:00			`# Create a configuration file for PPSS, that will be distributed to nodes.`
			`# Deploy PPSS to the nodes.`
			`# Start PPSS on all nodes.`
Edited wiki page through web user interface. 2009-03-12 21:12:43 +00:00
Edited wiki page through web user interface. 2009-03-10 11:46:05 +00:00			`== Preparation of server and nodes ==`

			`The following preparations must be made in order to use PPSS in a distributed fasion:`

			`# Create an unprivileged user 'ppss' on the server.`
			`# Create an unprivileged user 'ppss' on each node.`
			`# Generate a SSH key without a pass phrase.`
			`# Add the SSH key to the authorized_keys file of the 'ppss' user on the server.`
			`# Add the SSH key to the authorized_keys file of the 'ppss' user on the client.`
			`# Place PPSS on the server within the PPSS home directory.`

			`Security`
Edited wiki page through web user interface. 2009-03-10 11:46:38 +00:00			`Please note that usage of SSH keys without pass phrases may pose a security threat if the machines are shared with other users. You must decide for yourself if the security risk that is associated with this setup is acceptable for your environment. For example, if a node is compromised, the attacker will have (initially unprivileged) access to the server.`
Edited wiki page through web user interface. 2009-03-10 11:46:05 +00:00
Edited wiki page through web user interface. 2009-03-12 21:12:43 +00:00			`== Create a list of nodes ==`

			`A file must be created containing the hostnames (DNS) and/or IP-addresses of all nodes. The file must contain one node per line, such as:`

			`{{{`
			`192.168.0.100`
			`192.168.0.101`
			`host.domain.com`
			`...`
			`}}}`

Edited wiki page through web user interface. 2009-03-10 11:46:05 +00:00			`== Create a PPSS configuration file ==`

Edited wiki page through web user interface. 2009-03-12 21:12:43 +00:00			`This is the most important part of setting up distributed PPSS. It is exactly the same as setting up a configuration file for standalone mode, except that more options are necessary.`

			`The best way to explain how to create a configuration file for distributed PPSS is to provide an example. In this example, a script is used to encode WAV files to MP3. This script is called 'encode.sh' and takes a filename as an argument.`

			`./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir -s 192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh -n nodes.txt -t -o /some/output/dir`

			`It is quite a long command line, however, it is executed only once. Afther that, the config file config.cfg can be used for all further commands.`

			`Mode`

			`The first option sets the mode, in this case 'config' to generate a configuration file.`

			`Configuration file`

			`The second option, -C, specifies the name of the configuration file to be created.`

			`Command`

			`The third option, -c, specifies the command to be executed. Please take special note of the single quotes and the space behind the commad. You can read -c 'encode.sh ' also as -c 'encode.sh "$ITEM"'.`

			`Source directory`

			`This option specifies the location on the server where the files reside that must be processed. These files will be transfered using SCP to the nodes for local processing.`

			`Server`

			`The -s option specifies the SSH server that acts as both fileserver and SSH server for communication between nodes. The SSH server is mainly used for file-locking: nodes know that locked files are already processed or being processed, so another unlocked file must be selected.`

			`If the server acts both as a file server and SSH server, it is not recommended to use it also as a node, in this case for encoding. Filetransers using SSH can take quite some processing power.`

			`User name`
			`This is the name of the local system user that is used by the nodes to logon to the server. For deployment, such a user must also be present on the nodes.`

			`SSH Key`

			`Scripts using SSH require an SSH key withouth a passphrase. This key must be uploaded to the nodes an the nodes must know which key to use, so it must be specified.`

			`Script or program that must be uploaded`

			`The -S option specifies the script or program that should be uploaded to the node because it must be executed by the node for distributed computing. In this case, the encode.sh script must be deployed on all nodes and thus specified.`

			`List of nodes`

			`The -n option specifies the file containing all nodes. For every node, PPSS will perform actions such as deploy, start, stop and pause.`

			`Transfer files to local host`

			`If this option is specified, the file is copied from the source directory to a local temporary working directory for local processing. This is necessary if SCP is used to access files that must be processed.`

			`If files are distributed over NFS or SMB, the files seem to be present on the local system, because it is just a mount point and thus just a part of the local file system. In this case, the -t option can be omitted, however it it is specified, files are copied to a local directory using 'cp'.`

			`The output directory`

			`If the -t option is used, the -o option specifies the destination directory on the server. The results are uploaded to this directory. If the -t option is not specified, the command 'cp' is used to transfer files back to the specified output directory.`
Edited wiki page through web user interface. 2009-03-10 11:47:22 +00:00
			`== Deploy PPSS to nodes ==`

Edited wiki page through web user interface. 2009-03-12 21:19:57 +00:00			`Once SSH access is setup and the configuration file is generated, PPSS can be deployed to the nodes. This is very simple, as this example demonstrates:`

			`./ppss.sh deploy -C config.cfg

			`During the phase when we generated the configuration file, a nodes file was specified. Thus PPSS knows, just by reading this configuration file, which file contains a list of nodes.`

			`{{{`
			`bash-3.2$ ./ppss.sh deploy -C config.cfg`
			`mrt 12 22:18:22: INFO - ---------------------------------------------------------`
			`mrt 12 22:18:22: INFO - Distributed Parallel Processing Shell Script version 2.03`
			`mrt 12 22:18:22: INFO - Hostname: MacBoek.local`
			`mrt 12 22:18:22: INFO - Deploying PPSS on nodes.`
Edited wiki page through web user interface. 2009-03-12 21:25:39 +00:00			`mrt 12 22:18:24: INFO - PPSS installed on node 192.168.1.14.`
Edited wiki page through web user interface. 2009-03-12 21:19:57 +00:00			`mrt 12 22:18:28: INFO - PPSS installed on node 192.168.1.12.`
			`mrt 12 22:18:29: INFO - PPSS installed on node 192.168.1.4.`
			`mrt 12 22:18:31: INFO - PPSS installed on node 192.168.1.31.`
			`}}}`

			`Deployment of PPSS is serialised, not parallel. Maybe it should be performed in parallel. This is a to-do.`
Edited wiki page through web user interface. 2009-03-10 11:47:22 +00:00
			`== Start PPSS on nodes ==`

Edited wiki page through web user interface. 2009-03-12 21:25:39 +00:00			`Just as simple as deploying PPSS, PPSS is started on all nodes.`

			`./ppss.sh start -C config`

			`{{{`
			`mrt 12 22:21:17: INFO - ---------------------------------------------------------`
			`mrt 12 22:21:17: INFO - Distributed Parallel Processing Shell Script version 2.03`
			`mrt 12 22:21:17: INFO - Hostname: MacBoek.local`
			`mrt 12 22:21:17: INFO - Starting PPSS on node 10.0.0.14.`
			`mrt 12 22:21:17: INFO - Starting PPSS on node 10.0.0.12.`
			`mrt 12 22:21:20: INFO - Starting PPSS on node 10.0.0.4.`
			`mrt 12 22:21:20: INFO - Starting PPSS on node 10.0.0.31.`
			`}}}`
Edited wiki page through web user interface. 2009-03-12 21:12:43 +00:00
			`== Stop pause and continue PPSS on nodes ==`

Edited wiki page through web user interface. 2009-03-12 21:25:39 +00:00			`To stop, pause or continue processing on all nodes, use the following commands:`
Edited wiki page through web user interface. 2009-03-12 21:12:43 +00:00
Edited wiki page through web user interface. 2009-03-12 21:25:39 +00:00			`./ppss.sh stop -C config.cfg`
			`./ppss.sh pause -C config.cfg`
			`./ppss.sh continue -C config.cfg`

			`Please note that nodes will continue processing the current item they are working on, they just stop processing new items if stop or pause is selected.`
Edited wiki page through web user interface. 2009-03-12 21:12:43 +00:00
Edited wiki page through web user interface. 2009-03-12 21:25:39 +00:00			`== Show progress ==`
Edited wiki page through web user interface. 2009-03-12 21:12:43 +00:00
Edited wiki page through web user interface. 2009-03-12 21:25:39 +00:00			`The overall process of the 'cluster' is determined by the number of files present in the input and output directories on the server.`
Edited wiki page through web user interface. 2009-03-12 21:12:43 +00:00
			`{{{`

			`bash-3.2$ ./ppss.sh status -C config.cfg`
			`mrt 12 21:06:17: INFO - ---------------------------------------------------------`
			`mrt 12 21:06:17: INFO - Distributed Parallel Processing Shell Script version 2.03`
			`mrt 12 21:06:17: INFO - Hostname: MacBoek.local`
Edited wiki page through web user interface. 2009-03-12 21:25:39 +00:00			`mrt 12 21:06:17: INFO - 56 percent complete.`
Edited wiki page through web user interface. 2009-03-12 21:12:43 +00:00
Edited wiki page through web user interface. 2009-03-12 21:31:45 +00:00			`}}}`

			`== Logging ==`

			`An important feature of PPSS is its extensive logging. There are two types of log files.`

			`* A single log file created by PPSS itself. This file is found on the local nodes. Using tail -f on these files, it is possible to monitor what PPSS is currently doing.`

			`{{{`
			`mrt 10 23:51:15: INFO - ---------------------------------------------------------`
			`mrt 10 23:51:15: INFO - Distributed Parallel Processing Shell Script version 2.03`
			`mrt 10 23:51:15: INDO - ---------------------------------------------------------`
			`mrt 10 23:51:15: INFO - Hostname: MacBoek.local`
			`mrt 10 23:51:15: DEBUG - Found 2 logic processors.`
			`mrt 10 23:51:15: INFO - ---------------------------------------------------------`
			`mrt 10 23:51:17: DEBUG - Job log directory JOB_LOG exists.`
			`mrt 10 23:51:18: INFO - Listener started.`
			`mrt 10 23:51:18: INFO - Starting 2 workers.`
			`mrt 10 23:51:18: INFO - Currently 0 percent complete. Processed 0 of 625 items.`
			`mrt 10 23:51:18: DEBUG - Trying to lock item 20060731.wav.`
			`mrt 10 23:51:18: DEBUG - Item 20060731.wav is locked.`
			`mrt 10 23:51:18: INFO - Currently 0 percent complete. Processed 1 of 625 items.`
			`mrt 10 23:51:18: DEBUG - Trying to lock item 20060801.wav.`
			`mrt 10 23:51:18: DEBUG - Item 20060801.wav is locked.`
			`mrt 10 23:51:18: INFO - Currently 0 percent complete. Processed 2 of 625 items.`
			`mrt 10 23:51:18: DEBUG - Trying to lock item 20060802.wav.`
			`mrt 10 23:51:18: DEBUG - Item 20060802.wav is locked.`
			`mrt 10 23:51:18: INFO - Currently 0 percent complete. Processed 3 of 625 items.`
			`mrt 10 23:51:18: DEBUG - Trying to lock item 20060803.wav.`
			`mrt 10 23:51:18: DEBUG - Item 20060803.wav is locked.`
			`............`
			`mrt 10 23:51:23: DEBUG - Item 20060830.wav is locked.`
			`mrt 10 23:51:23: INFO - Currently 3 percent complete. Processed 23 of 625 items.`
			`mrt 10 23:51:23: DEBUG - Trying to lock item 20060831.wav.`
			`mrt 10 23:51:23: DEBUG - Got lock on 20060831.wav, processing.`
			`mrt 10 23:51:23: DEBUG - Transfering item 20060831.wav to local disk.`
			`mrt 10 23:52:18: DEBUG - Exit code of transfer is 0`
			`mrt 10 23:52:18: DEBUG - Processing item 20060831.wav`

			`}}}`

			`* An individual log file containing information and output of each processed item. these files are uploaded to the SSH server to the 'job_log' directory. For every item, a log file must be present.`
Edited wiki page through web user interface. 2009-03-12 21:34:11 +00:00
			`{{{`
			`===== PPSS Item Log File =====`
			`Host: MacBoek.local`
			`Item: PPSS_LOCAL_TMPDIR/20060831.wav`
			`Start date: mrt 10 23:52:18`

			`Encode of PPSS_LOCAL_TMPDIR/20060831.wav successful.`

			`Status: Succes - item has been processed.`
			`Elapsed time (h:m:s): 0:5:23`

			`}}}`
Edited wiki page through web user interface. 2009-03-12 21:31:45 +00:00
Edited wiki page through web user interface. 2009-03-12 21:34:11 +00:00			`As you can see, with a few simple grep commands, it is possible to quickly determine which items have failed to process.`