The client wanted a product scraper to extract data from around hundred plus fashion sites including GAP, Macys and Nordstrom brands. The required data points were product data, along with all possible variants of a particular product like different colors and sizes.
The client provided us with the list of source websites to be crawled and the data points required. The extraction frequency was set for a daily basis.
Our team set up crawlers to fetch the required data fields from the source sites. This use case comes under our site crawl offering since the source websites had various format and design.
The client needed the extracted data in CSV format and be uploaded to their S3 servers. The initial setup was complete in a few days and the crawlers started delivering data immediately.
About 200 k records were delivered to the client during the first crawl.
Client Requirements: Client shared their data requirements in terms of list of source websites, product data points and data extraction frequency.
Custom Product Scraper Set Up: Our team set up crawlers to scrape product data including product name, description, specifications, price, discounts for each color and size variation.
Data Delivery: The details were extracted using the specifically programmed web crawlers and delivered to the client in their desired frequency and file format directly onto their S3 locations.
The data was large in quantity with 1 Million records being scraped and delivered in clean and structured format daily.