Download Our Latest Case Study
Crawling the web: The Trends and Challenges
As an evolving field, extracting data from the web is still a gray area – without any clear ground rules regarding the legality of web scraping. With growing concerns among companies regarding how others use their data, crawling the web is gradually becoming more and more complicated. The situation is further aggravated by the growing […]
Read MoreInteractive Crawls for Scraping AJAX Pages on the Web
Crawling pages on the web has become an everyday affair for most enterprises. Too often do we come across offline businesses as well who’d like data gathered from the web for internal analyses. All this eventually to serve customers faster and better. At times, when the crawl job is high-end cum high-scale, businesses also consider […]
Read MoreBig Data Democratization via Web Scraping
Big Data Democratization via Web Scraping If we had to put democratization of data in line with the classroom definition of democracy, it would read- Data by the people, for the people, of the people. Makes a lot of sense, doesn’t it? It resonates with the generic feeling we have these days with respect to […]
Read MoreA roundup of 7 most exciting web crawl use cases of 2013
2013 was an exciting year of growth at PromptCloud and feels like it went a mile a minute. Here’s a moment to some of the most amusing tasks that kept us on our toes and took our crawl platform forward. Following is a list of such requirements that came along- 1. Products data normalization– Early […]
Read MoreScraping Data: Site-specific Extractors vs. Generic Extractors
Scraping is becoming a rather mundane job with every other organization getting its feet wet with it for their own data gathering needs. There have been enough number of crawlers built – some open-sourced and others internal to organizations for in-house utilities. Although crawling might seem like a simple technique at the onset, doing this […]
Read MoreIntroducing Amazon EC2 On-demand vs. Reserved Instance Price Calculator
If you use Amazon EC2 extensively, then you’ve fallen prey to the dilemma of “to reserve or not to reserve”. It is however a simple question in the sense that for a reserved instance, you pay a lesser per hour fee but end up paying an additional one-time fee upfront. So what is the optimal […]
Read MoreWeb Scraping Software vs Hosted Crawl Solution
Web scraping is a widely known term these days; not just because so much data exists around us, but more because there’s already so much being done with that data. Let’s try to analyze the differences between opting for software that comes with DIY components over picking a hosted data acquisition or hosted crawl solution […]
Read MoreConfluence of Data Mining and Web Crawling
1993– 90’s saw a buzz in data mining, the days when tech publishers started part series on mining techniques and approaches. Courses were introduced in colleges and multiple researches produced to ride on this wave of data mining. Data mining essentially meant employing clustering or machine learning techniques to draw out conclusions based on data […]
Read MoreExtract product data feeds from E-commerce websites
Increasing attendance of retailers online and big data soaring new heights calls for a quick look at what’s trending these days in the context of the e-commerce landscape. Requirements that we receive from our enterprise clients with respect to crawling and extracting products can by far be categorized into 3: Collecting product information from specific […]
Read More