ppss/wiki/DistributedPPSS.wiki

35 lines
1.9 KiB
Plaintext
Raw Normal View History

#summary How distributed PPSS will look like
#labels Phase-Design
= Introduction =
The goal is to make PPSS distributed. So a large number of host can be used to process items, not just a single host. These hosts will share one list of items to process. The most important aspect will be the way locking of these items will be handled.
= Details =
== SSH ==
The most simple and clean solution to make PPSS is the use of SSH to lock items. A lock directory must be created that will contain all lock files (lock directories) for items that must be processed.
To determine if an item has been processed, PPSS checks if a log file for an item is present in the directory job_log. This job_log directory should be shared with all hosts that are running PPSS. The most logical location for this directory is either within the source directory or within the home directory of the ssh user.
== File distribution ==
If items are files that need to be processed, they can be accessed in two ways:
* using a network file system such as NFS or SMB or other. The -d option must point to the mountpoint of this share.
* using scp within scripts to (securely) copy items (files) to the local host and copy the processed items back to the server. Please note that copying files using scp is much more resource intensive than SMB or NFS.
The funny thing is that if scp is used for file distribution, it doesn't matter where clients are physically located. They may be scattered all over the wold. The only thing that is required: enough bandwidth between clients and server.
SMB or NFS will confine PPSS to systems that are located within the local network, or a VPN tunnel must be used like OpenVPN.
== Requirements ==
* A central file server.
* The central file server must be accessible through ssh.
* Enough bandwidth for file distribution (Gigabit?)
optional:
* NFS / SMB share for distributing files / content