Edited wiki page through web user interface.

This commit is contained in:
Louwrentius 2010-07-19 08:32:20 +00:00
parent 6e2a9b77ed
commit 62e98909ea

View File

@ -3,13 +3,13 @@
= Introduction =
Most computer systems that are bought over the last couple of years feature at least two processor cores or sometimes even more. Most programs and tasks do not benefit from these extra CPU cores because software must be written in such a way that it benefits from extra CPUs. Most of the time, only one CPU or CPU core is used.
Most recent computer systems feature at least two processor cores or sometimes even more. Most programs and tasks do not benefit from these extra CPU cores because software must be (re)written in such a way that it benefits from extra CPU cores. Most of the time, only one CPU or CPU core is used. This is a waste of resources.
Most users can't benefit from these extra CPU cores, because the programs they use are often not aware of the extra cpu cores. Often, the task they perform by itself cannot be distributed over multiple processors. For example, resizing a photo or converting a file into some different format.
Most users can't benefit from these extra CPU cores, because the programs they use are often not aware of the extra cpu cores. To support parallel processing, software must often be substantially be rewritten, which is often not done. So only one core can be used and the other core(s) are just idling, while if they could also be used, the job could be done in half (dual-core) or a quarter (quad-core) or even less (distributed cluster) of the time.
Although often, processes by themselves cannot be parallelized, if a large number of these processes must be executed on separate items (for example, files), they can most of the time be executed in parallel.
The solution is to just run the application multiple times in parallel. This is of cource only beneficiary if you have more than one file or item to process. And that is the principle behind PPSS.
The simple idea behind PPSS is that, you have a (large) number of items, files for example, and you want to perform some action on them. Instead of processing one item at at time, you want to process 4 items at a time, since you have a nice quad-core CPU. You will need a system that can keep keep track of running separate jobs, start new jobs if previous jobs finished and very important: keep track of which files have been processed. And wouldn't it be nice if any output of those processes is logged, so you can verify if all items are processes correctly?
The simple idea behind PPSS is that, you have a (large) number of items, files for example, and you want to perform some action on them. Instead of processing one item at at time, you want to process 4 items at a time, since you have a nice quad-core CPU. A program is required that starts a process for every core, and when a process finishes, starts a new one. And some logging of the result (success or failure?) would also be nice.
PPSS does this for you.
@ -20,10 +20,11 @@ Features of PPSS are:
* Very easy to use. You may be up and running within 5 minutes.
* Will run on any system that supports bash (although only tested on Linux and Mac OS X)
* Automatically detects the number of CPUs and CPU cores and start a worker for each of them.
* Automatically detects the number of CPUs and CPU cores and start a worker process for each of them.
* Supports hyper-threading if available.
* All output of individual processes will be logged for your inspection.
* All output of individual processes will be logged for your inspection (where there errors? How long did it take?).
* Actions performed by PPSS are logged to a log file for your inspection.
* Can take a text file with one item per line. Items can be what you want. URLs, files, anything.
* Can process a text file with one item per line. Items can be what you want. URLs, files, anything. Each line is fed to the command you specify.
* Can execute any command you like. Can execute your own scripts in parallel.
* If interrupted, will by default continue where it was left, skipping processed files.
* Can be run in distributed mode as a cluster over multiple computer systems using SSH.