The Solution: Site-specific crawls were employed in this case, which focused on the client’s website. The solution extracted pre-defined data points from the client’s website; important data fields were the unique serial identifier of a product, product name, category, URL link, crawling timestamp, store location, price, and inventory stock availability. Considering the client’s interest in pricing benchmarking, web crawlers were also created for the competitors’ sites. Crawlers collected data from fields such as the unique identifier of a product, URL link, product name, category, crawl timestamp, store, location, price, stock availability in the inventory. The collected data from the above two executions were then classified by zip codes and was used by the client for further analysis. The dataset was delivered to the client in JSON format via PromptCloud’s REST API.
Benefits:
- Noise-free data is made available to the client based on the requirements.
- Cut-down on redundancy since the client listed out which stores they wanted to set crawlers for data extraction.
- No client intervention was required during the crawling procedure.
- Reduced cost and time delay for the client, since clean data was delivered for analysis.
- The schema was altered as per client’s request.
- Periodical updates based on the frequency of crawling was also incorporated.