×

Download Our Latest Case Study

Explore how we helped the global wellness pioneer in the real estate sector to improve brand visibility and occupant well-being!!!

Name
Contact information

PromptCloud Inc, 16192 Coastal Highway, Lewes De 19958, Delaware USA 19958

We are available 24/ 7. Call Now. marketing@promptcloud.com
Avatar

We were looking for a light Linux shell tool that could process given commands using advantage of multicore system (parallel processing). Two of the tools that we came across are worth writing about – parallel and PPSS. This blog discusses PPSS, while parallel can be food for a future blog.

PPSS can be downloaded from https://code.google.com/p/ppss/ (there are deb and rpm files too). PPSS is a shell script that can be used to run any command, script, or program in parallel. All it needs is a source (file or dir) and a command to execute.

Example-.

./ppss -d -c ” -p

./ppss -f -c ” -p

If source is a directory, then it executes the command on each file in the directory and if the source is a file, it executes the command on each line in the file.

Note: command should always be enclosed in single quotes. Argument (each file in source directory or each line in source file) can be accessed in command by the variable – “$ITEM”. At any time, number of items being processed will never increase the cores available, say while processing 50GB of data.

$~/bin/ppss -f list_of_files_to_be_processed.txt -c ‘zgrep “”

../../”$ITEM”‘ -p 2

Jul 26 09:11:22:  =========================================================                                                                                             

Jul 26 09:11:22:                         |P|P|S|S|

Jul 26 09:11:22:  Distributed Parallel Processing Shell Script vers. 2.97                                                                                               

Jul 26 09:11:22:  =========================================================

Jul 26 09:11:22:  Hostname:             domU-12-31-39-00-EC-96                                                                                                          

Jul 26 09:11:22:  ———————————————————

Jul 26 09:11:22:  CPU: Dual-Core AMD Opteron(tm) Processor 2218 HE                                                                                                      

Jul 26 09:11:22:  Starting 2 parallel workers.

Jul 26 09:11:22:  ———————————————————

Jul 26 13:18:18:  70% complete. Processed 10199 of 14400. Failed 158/14400

Jul 26 13:16:33:  ETA: Thu Jul 26 13:11:23 UTC 2012

Output of each command executed on each item is logged in a single file named after the item. The log file is available at location – ‘./ppss_dir/job_log/’. Command execution status for the item can be obtained from log file and it displays as- “Status: FAILURE / Status: Success “

For more information on ppss, follow the links below.

https://code.google.com/p/ppss/

https://code.google.com/p/ppss/downloads/detail?name=ppss-2.85.tgz

Sharing is caring!

Recent post

SEO Data Analytics
Can SEO Data Analytics make Data Engineering
  • August 26, 2022
import.io Competitors and Alternatives
Top 10 import.io Competitors and Alternatives
  • August 18, 2022
Zyte Competitors and Alternatives
Top 10 Zyte Competitors and Alternatives
  • August 18, 2022
ScrapeHero Competitors and Alternatives
Top 10 ScrapeHero Competitors and Alternatives
  • August 18, 2022
Webscraper.io Competitors and Alternatives
Top 10 Webscraper.io Competitors and Alternatives
  • August 12, 2022
OctoParse Competitors and Alternatives
Top 10 Octoparse Competitors and Alternatives
  • August 10, 2022
Click on Contact Us below to Get started with your Project Requirements

Are you looking for a custom data extraction service?

Contact Us