Edited wiki page through web user interface.

2010-06-27 23:14:42 +00:00 · 2010-06-27 23:14:42 +00:00 · 84d30808d2
commit 84d30808d2
parent 4abd5fc7a3
1 changed files with 20 additions and 17 deletions
--- a/wiki/Design.wiki
+++ b/wiki/Design.wiki
@ -19,23 +19,24 @@ There are two main ingredients that must be supplied to PPSS
 For every item the specified command will be executed with the item supplied as an argument.

  * At any given moment there will be no more commands running in parallel other than specified by the command-line or based on the detected number of cpu cores.
-  * Two parallel running processes should never interfere or collide with each other by processing the same item
-  * PPSS should not poll but wait for events to occur and 'do nothing' if there is nothing to do.  
-
+  * Two parallel running processes should never interfere or collide with each other by processing the same item.
+  * PPSS should not poll but wait for events to occur and 'do nothing' if there is nothing to do.  It must be event-driven.

 == Communication between parent and child processes == 

-One of the main difficulties for shell scripts is inter-process communication. There is no communication mechanism for child and parent processes to communicate with each other. A solution might be the use of signals with the 'trap' command to catch events, however tests have proven that this is not reliable. The trap mechanism that bash employs is inherently unreliable (by design). During the time period the trap command is processing a trap,  additional traps are ignored. Therefore, it is not possible to create a reliable mechanism using signals. There is actually a parallel processing shell script available on the web that is based on signals, and suffers exactly from this problem, which makes it unreliable.
+One of the main difficulties for shell scripts is interprocess communication. There is no communication mechanism for child and parent processes to communicate with each other. A solution might be the use of signals with the 'trap' command to catch events, however tests have proven that this is not reliable. The trap mechanism that bash employs is inherently unreliable (by design). During the time period the trap command is processing a trap,  additional traps are ignored. Therefore, it is not possible to create a reliable mechanism using signals. There is actually a parallel processing shell script available on the web that is based on signals, and suffers exactly from this problem, which makes it unreliable.

-However, repeated tests have determined that communication between processes using a FIFO named pipe is reliable and can be used for inter-process communication. 
+However, repeated tests have determined that communication between processes using a FIFO named pipe is reliable and can be used for interprocess communication. PPSS uses a FIFO to allow a child process to communicate with the parent process. 
+
+Within PPSS, a child process only tells the master process one thing: 'I finished processing'. Either a new process is started processing the next item.

 == Queue management ==

-A single listener process is constantly requesting items and starting processes to process these items. As processes end, an event is generated and send to to the FIFO that triggers the listener to start a new process. 
+There is a single listener process that is just waiting for events to occur, by listening to a FIFO. The most important event is that a worker process should be started. This listener process will request a new item and will start a worker process to process this item. 

 Since the listener is the central process that requests items, no locking mechanism is required. Versions of PPSS before 2.80 had a cumbersome locking mechanism to prevent race conditions, however as of 2.80 this is no longer necessary. 

-Locking is only used in distributed mode. Single items are 'locked' or claimed by one of the nodes of the 'cluster' through SSH on the main server. This prevents other nodes from processing the same item.
+Locking is only used to lock individual items. This allows multiple instances of PPSS to process the same local pool of items. For example, you started PPSS with two workers, but it seems that there is room for more workers. Just execute PPSS again with the same parameters and you will have two instances of PPSS processing the same bunch of items. 

 == Technical design ==

@ -43,23 +44,29 @@ http://home.quicknet.nl/mw/prive/nan1/got/ppss-schema.png

 === Function: get_all_items ===

-The first step of PPSS is to read all items that must be processed into a special text file. Items are read from this file using 'sed' and fed to the get_item function.
+The first step of PPSS is to read all items that must be processed into a special text file. Items are read from this file using 'sed' and fed to the get_item function. 
+
+=== get_item function ===
+
+If called, an item will be read from the special input file and a global counter is increased, so the next time the function is executed, the next item on the list is returned. Sed is used to read a particular line number from the internal text file containing item names. The line number is based on a global counter that is increased each time an item is returned. 

 === Function: listen_for_job ===

-The second step is to start the listener. This is a process running in the background that listens on a FIFO special file (named pipe). 
+The listen_for_job function is a process running in the background that listens on a FIFO special file (named pipe). 

 For every messages that is received, the listener will execute the 'get_item' function to get an item. The commando function is then executed with this item as an argument. The commando function is run as a background process.

 If the list of items has been processed, the get_item function will return with a non-null return code, and the listen_for_job function will not start a new commando process. Thus over time, when commando jobs finish, all jobs die out. Once listen_for_job registers that all running jobs have died, it kills of PPSS itself.

-The whole listen_for_job function is executed as a background process, not just the user-supplied command. This function is the only permanent (while) loop running and is often blocked when no input is received, so it is doing nothing most of the time.
+The listen_for_job function keeps a counter for every worker thread that dies. Once this number hits the maximum number of parallel workers (like 4 if you have a quad-core CPU), it will terminate itself and eventually PPSS itself.
+
+The whole listen_for_job function is executed as a background process. This function is the only permanent (while) loop running and is often blocked when no input is received, so it is doing nothing most of the time. This means that if PPSS has nothing to do, your system won't be wasting CPU cycles on some looping or polling.

 === Function: start_all_workers ===

-For every available cpu core, a parallel thread will be started. If a user manually specifies a number of threads, that number will override the detected number of cores. 
+For every available cpu core, a thread will be started. If a user manually specifies a number of threads, that number will override the detected number of CPU cores. 

-So the start_single_worker function is called, this function requests an item with the get_item function and sends the item to the FIFO. There, it will be picked up by the listener process, which will execute the commando function to process the item.
+So the start_single_worker function is called for each thread. This function just sends a message to the FIFO. There, it will be picked up by the listener process, which will request an item and execute the commando function to process the item.

 === Command function ===

@ -69,12 +76,8 @@ The command function performs the following tasks:
  * execute the user-supplied command with the item as an argument
  * execute the 'start_single_worker' function to start a new job for a new item.

-The third option is the most relevant. After the command finishes, it calls the start_single_worker function. The snake biting-its-own-tail mechanism. 
+The third option is the most relevant. After the command finishes, it calls the start_single_worker function. The snake biting-its-own-tail mechanism. Essentially, a running thread keeps itself running by starting a new thread after it finishes, until there are no items to process.

 === start_single_worker function ===

 The start_single_worker function will send a message to the fifo to inform the listener process that a commando should be executed. 
-
-=== get_item function ===
-
-If called, an item will be read from the special input file and a file pointer is increased, so the next time the function is executed, the next item on the list is returned.