Submit Your Requirement

Download Web Data Acquisition Framework

Did you know that there are 12 factors to be considered while acquiring data from the web? If no, fret not! Download our free guide on web data acquisition to get started!

Name(Required)
X
Scroll down to discover

PPSS- a handy tool for parallel processing

August 27, 2012Category : Blog

We were looking for a light Linux shell tool, that could process given commands using advantage of multicore system (parallel processing). Two of the tools that we came across is worth writing about – parallel and PPSS. This blog discusses PPSS, while parallel can be food for a future blog.

PPSS can be downloaded from https://code.google.com/p/ppss/ (there are deb and rpm files too). PPSS is a shell script that can be used to run any command, script or program in parallel. All it needs is source (file or dir) and a command to execute.

Example-.

./ppss -d -c ” -p

./ppss -f -c ” -p

If source is a directory, then it executes the command on each file in the directory and if source is a file, it executes the command on each line in the file.

Note: command should always be enclosed in single quotes. Argument (each file in source directory or each line in source file) can be accessed in command by the variable – “$ITEM”. At any time, number of items being processed will never increase the cores available, say while processing 50GB of data.

$~/bin/ppss -f list_of_files_to_be_processed.txt -c ‘zgrep “”

../../”$ITEM”‘ -p 2

Jul 26 09:11:22:  =========================================================                                                                                             

Jul 26 09:11:22:                         |P|P|S|S|

Jul 26 09:11:22:  Distributed Parallel Processing Shell Script vers. 2.97                                                                                               

Jul 26 09:11:22:  =========================================================

Jul 26 09:11:22:  Hostname:             domU-12-31-39-00-EC-96                                                                                                          

Jul 26 09:11:22:  ———————————————————

Jul 26 09:11:22:  CPU: Dual-Core AMD Opteron(tm) Processor 2218 HE                                                                                                      

Jul 26 09:11:22:  Starting 2 parallel workers.

Jul 26 09:11:22:  ———————————————————

Jul 26 13:18:18:  70% complete. Processed 10199 of 14400. Failed 158/14400

Jul 26 13:16:33:  ETA: Thu Jul 26 13:11:23 UTC 2012

Output of each command executed on each item is logged in a single file named after the item. The log file is available at location – ‘./ppss_dir/job_log/’. Command execution status for the item can be obtained from log file and it displays as- “Status: FAILURE / Status: Success “

For more information on ppss, follow the links below.

https://code.google.com/p/ppss/

https://code.google.com/p/ppss/downloads/detail?name=ppss-2.85.tgz

Web Scraping Service CTA

Leave a Reply

Your email address will not be published. Required fields are marked *

Generic selectors
Exact matches only
Search in title
Search in content
Filter by Categories
Blog
Branding
Classified
Data
eCommerce and Retail
Enterprise
Entertainment
Finance
Healthcare
Job
Marketing
Media
Real Estate
Research and Consulting
Restaurant
Travel
Web Scraping

Get The Latest Updates

© Promptcloud 2009-2020 / All rights reserved.
To top