Contact information

PromptCloud Inc, 16192 Coastal Highway, Lewes De 19958, Delaware USA 19958

We are available 24/ 7. Call Now. marketing@promptcloud.com
Avatar

We were looking for a light Linux shell tool that could process given commands using advantage of multicore system (parallel processing). Two of the tools that we came across are worth writing about – parallel and PPSS. This blog discusses PPSS, while parallel can be food for a future blog.

PPSS can be downloaded from https://code.google.com/p/ppss/ (there are deb and rpm files too). PPSS is a shell script that can be used to run any command, script, or program in parallel. All it needs is a source (file or dir) and a command to execute.

Example-.

./ppss -d -c ” -p

./ppss -f -c ” -p

If source is a directory, then it executes the command on each file in the directory and if the source is a file, it executes the command on each line in the file.

Note: command should always be enclosed in single quotes. Argument (each file in source directory or each line in source file) can be accessed in command by the variable – “$ITEM”. At any time, number of items being processed will never increase the cores available, say while processing 50GB of data.

$~/bin/ppss -f list_of_files_to_be_processed.txt -c ‘zgrep “”

../../”$ITEM”‘ -p 2

Jul 26 09:11:22:  =========================================================                                                                                             

Jul 26 09:11:22:                         |P|P|S|S|

Jul 26 09:11:22:  Distributed Parallel Processing Shell Script vers. 2.97                                                                                               

Jul 26 09:11:22:  =========================================================

Jul 26 09:11:22:  Hostname:             domU-12-31-39-00-EC-96                                                                                                          

Jul 26 09:11:22:  ———————————————————

Jul 26 09:11:22:  CPU: Dual-Core AMD Opteron(tm) Processor 2218 HE                                                                                                      

Jul 26 09:11:22:  Starting 2 parallel workers.

Jul 26 09:11:22:  ———————————————————

Jul 26 13:18:18:  70% complete. Processed 10199 of 14400. Failed 158/14400

Jul 26 13:16:33:  ETA: Thu Jul 26 13:11:23 UTC 2012

Output of each command executed on each item is logged in a single file named after the item. The log file is available at location – ‘./ppss_dir/job_log/’. Command execution status for the item can be obtained from log file and it displays as- “Status: FAILURE / Status: Success “

For more information on ppss, follow the links below.

https://code.google.com/p/ppss/

https://code.google.com/p/ppss/downloads/detail?name=ppss-2.85.tgz

Sharing is caring!

Are you looking for a custom data extraction service?

Contact Us

Don't Miss Out!

Subscribe to our newsletter to stay informed about the latest industry developments,  guides on emerging technology, web scraping tips, upcoming events, and more!

  • This field is for validation purposes and should be left unchanged.