Scrape Clothing data from Fashion sites via API

Client: A popular Ecommerce platform from USA operating in the fashion products niche

Domain: Ecommerce

Challenge: The client wanted to scrape clothing data from around 100 fashion sites like GAP, Macys and Nordstrom. The required data points were product data along with all possible variants of a particular product like different colors and sizes.
The client provided us with the list of sources to be crawled and the data points required. The extraction was to be done on daily basis which meant fresh data sets have to be provided everyday. Our team set up crawlers to fetch the required data fields from the source sites provided by the client. This use case comes under our site specific crawl offering since the websites in the list had different structuring and design. The client needed the extracted data in CSV format and be uploaded to their S3 servers. The initial setup was complete in a few days and the crawlers started delivering data immediately. About 200 k records were delivered to the client during the first crawl.

The Solution:
Client shared the list of source websites and the data points to be extracted. The frequency of data was daily, meaning fresh data was needed from all the sources everyday. Our team set up crawlers for the source websites to extract the required data fields like product name, description, specifications, price, discounts for each color and size variation. Site specific crawl was used for this since every site in the list had a different structure. The details were extracted using the specifically programmed web crawlers and delivered to the client in their desired frequency and file format directly onto their S3 locations. The data was large in quantity with 1 Million records being scraped and delivered in clean and structured format daily.

Benefits:
  • Client started receiving data in a matter of few days after the initial setup was complete
  • All the technical aspects of the process were taken care of by our team
  • We set up monitoring for the source websites to track changes being made to them that require modification of the crawling setup
  • Large volumes of data were handled with ease using PromptCloud’s advanced tech stack
  • Client was able to launch their fashion app marketplace within a short period of time after starting the data acquisition project
View Sample Data
 
  • This field is for validation purposes and should be left unchanged.
 
Looking to extract product data?
SUBMIT REQUIREMENT
  • This field is for validation purposes and should be left unchanged.
  • Please submit the requirement on CrawlBoard if you're looking to crawl more than 3 sites.
  • This field is for validation purposes and should be left unchanged.
  • Please submit the requirement on CrawlBoard if you're looking to crawl less than 3 sites.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • Please submit the requirement on CrawlBoard if you're looking to crawl less than 3 sites.
  • This field is for validation purposes and should be left unchanged.

Price Calculator

  • Total number of websites
  • number of records
  • including one time setup fee
  • from second month onwards
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.
  • This field is for validation purposes and should be left unchanged.