Client: A popular Ecommerce platform from USA operating in the fashion products niche
Offering: Site-specific crawl and extraction
Challenge: The client wanted to scrape clothing data from around 100 fashion sites like GAP, Macys and Nordstrom. The required data points were product data along with all possible variants of a particular product like different colors and sizes.
Solution: Client shared the list of source websites and the data points to be extracted. The frequency of data was daily, meaning fresh data was needed from all the sources everyday. Our team set up crawlers for the source websites to extract the required data fields like product name, description, specifications, price, discounts for each color and size variation. Site specific crawl was used for this since every site in the list had a different structure. The details were extracted using the specifically programmed web crawlers and delivered to the client in their desired frequency and file format directly onto their S3 locations. The data was large in quantity with 1 Million records being scraped and delivered in clean and structured format daily.