We were looking for a light Linux shell tool that could process given commands using advantage of multicore system (parallel processing). Two of the tools that we came across are worth writing about – parallel and PPSS. This blog discusses PPSS, while parallel can be food for a future blog.
PPSS can be downloaded from https://code.google.com/p/ppss/ (there are deb and rpm files too). PPSS is a shell script that can be used to run any command, script, or program in parallel. All it needs is a source (file or dir) and a command to execute.
Example-.
./ppss -d -c ” -p
./ppss -f -c ” -p
If source is a directory, then it executes the command on each file in the directory and if the source is a file, it executes the command on each line in the file.
Note: command should always be enclosed in single quotes. Argument (each file in source directory or each line in source file) can be accessed in command by the variable – “$ITEM”. At any time, number of items being processed will never increase the cores available, say while processing 50GB of data.
$~/bin/ppss -f list_of_files_to_be_processed.txt -c ‘zgrep “”
../../”$ITEM”‘ -p 2
Jul 26 09:11:22: =========================================================
Jul 26 09:11:22: |P|P|S|S|
Jul 26 09:11:22: Distributed Parallel Processing Shell Script vers. 2.97
Jul 26 09:11:22: =========================================================
Jul 26 09:11:22: Hostname: domU-12-31-39-00-EC-96
Jul 26 09:11:22: ———————————————————
Jul 26 09:11:22: CPU: Dual-Core AMD Opteron(tm) Processor 2218 HE
Jul 26 09:11:22: Starting 2 parallel workers.
Jul 26 09:11:22: ———————————————————
Jul 26 13:18:18: 70% complete. Processed 10199 of 14400. Failed 158/14400
Jul 26 13:16:33: ETA: Thu Jul 26 13:11:23 UTC 2012
Output of each command executed on each item is logged in a single file named after the item. The log file is available at location – ‘./ppss_dir/job_log/’. Command execution status for the item can be obtained from log file and it displays as- “Status: FAILURE / Status: Success “
For more information on ppss, follow the links below.
https://code.google.com/p/ppss/
https://code.google.com/p/ppss/downloads/detail?name=ppss-2.85.tgz