#summary Design and technical overview. #labels Phase-Design = Introduction = This wiki page describes how PPSS is designed, how it works and which techniques are used. *Please note that the design has changed with version 2.80 and differs from older versions.* = Design = There are two main ingredients that must be supplied to PPSS # A list of items that must be processed: * either a text file containing one item per line. These items can represent whatever you want; * or a directory containing files that must be processed. # A command that must be executed for each item. For every item the specified command will be executed with the item supplied as an argument. * At any given moment there will be no more commands running in parallel other than specified by the command-line or based on the detected number of cpu cores. * Two parallel running processes should never interfere or collide with each other by processing the same item * PPSS should not poll but wait for events to occur and 'do nothing' if there is nothing to do. == Communication between parent and child processes == One of the main difficulties for shell scripts is inter-process communication. There is no communication mechanism for child and parent processes to communicate with each other. A solution might be the use of signals with the 'trap' command to catch events, however tests have proven that this is not reliable. The trap mechanism that bash employs is inherently unreliable (by design). During the time period the trap command is processing a trap, additional traps are ignored. Therefore, it is not possible to create a reliable mechanism using signals. There is actually a parallel processing shell script available on the web that is based on signals, and suffers exactly from this problem, which makes it unreliable. However, repeated tests have determined that communication between processes using a FIFO named pipe is reliable and can be used for inter-process communication. == Queue management == A single listener process is constantly requesting items and starting processes to process these items. As processes end, an event is generated and send to to the FIFO that triggers the listener to start a new process. Since the listener is the central process that requests items, no locking mechanism is required. Versions of PPSS before 2.80 had a cumbersome locking mechanism to prevent race conditions, however as of 2.80 this is no longer necessary. Locking is only used in distributed mode. Single items are 'locked' or claimed by one of the nodes of the 'cluster' through SSH on the main server. This prevents other nodes from processing the same item. == Technical design == http://home.quicknet.nl/mw/prive/nan1/got/ppss-schema.png === Function: get_all_items === The first step of PPSS is to read all items that must be processed into a special text file. Items are read from this file using 'sed' and fed to the get_item function. === Function: listen_for_job === The second step is to start the listener. This is a process running in the background that listens on a FIFO special file (named pipe). For every messages that is received, the listener will execute the 'get_item' function to get an item. The commando function is then executed with this item as an argument. The commando function is run as a background process. If the list of items has been processed, the get_item function will return with a non-null return code, and the listen_for_job function will not start a new commando process. Thus over time, when commando jobs finish, all jobs die out. Once listen_for_job registers that all running jobs have died, it kills of PPSS itself. The whole listen_for_job function is executed as a background process, not just the user-supplied command. This function is the only permanent (while) loop running and is often blocked when no input is received, so it is doing nothing most of the time. === Function: start_all_workers === For every available cpu core, a parallel thread will be started. If a user manually specifies a number of threads, that number will override the detected number of cores. So the start_single_worker function is called, this function requests an item with the get_item function and sends the item to the FIFO. There, it will be picked up by the listener process, which will execute the commando function to process the item. === Command function === The command function performs the following tasks: * check if a supplied item has been processed already, if so, skip it. If a job log exists, the item is skipped. * execute the user-supplied command with the item as an argument * execute the 'start_single_worker' function to start a new job for a new item. The third option is the most relevant. After the command finishes, it calls the start_single_worker function. The snake biting-its-own-tail mechanism. === start_single_worker function === The start_single_worker function will send a message to the fifo to inform the listener process that a commando should be executed. === get_item function === If called, an item will be read from the special input file and a file pointer is increased, so the next time the function is executed, the next item on the list is returned.